stardelt Architecture
stardelt is an opinionated collection of open-source services, shipped together as a Kubernetes-native data platform. You bring the Kubernetes cluster; stardelt installs and wires the services that turn it into a lakehouse, a streaming platform, a notebook environment, and a BI surface — under a single UI.
This page is the high-level overview. Per-service detail (what each service does, upstream link, license) lives in Services.
What's in the box
Kubernetes — the platform you bring
stardelt is not a Kubernetes distribution. Any conformant cluster works: EKS, GKE, AKS, OpenShift, Rancher RKE2, k3s, kind, or a sovereign-cloud K8s (STACKIT, OVHcloud, IONOS, Hetzner, Open Telekom Cloud, Scaleway). Supported versions: current N and N-1.
stardelt core stack — always installed
The opinionated set that defines what stardelt is. Every install ships these services:
- stardelt Nova — unified UI. SSO landing, catalog browser, lineage view, audit search, platform health, and deep-links into the native UI of every service below.
- stardelt Operator — CRD reconciler / control plane. Composes the services below from a small declarative API.
- Apache Trino — distributed SQL engine; fast interactive queries over Iceberg.
- Apache Spark (Spark Connect) — distributed compute. Spark Connect exposes a persistent remote endpoint so notebooks and other clients connect without spawning their own driver.
- Apache Airflow — workflow orchestration for batch and scheduled jobs.
- JupyterHub — multi-user notebook environment; the front door for interactive analysis.
- Apache Superset — BI and dashboards over Trino.
- Apache Iceberg — open table format; the storage abstraction every engine reads and writes.
- Lakekeeper — Iceberg REST catalog; vends table metadata and short-lived object-store credentials.
- SeaweedFS — S3-compatible object storage; the default data plane.
- Apache Kafka (KRaft) — event-streaming backbone for ingest, CDC, and audit events.
Opt-in services — enabled per operator choice
These extend stardelt but aren't required. Operators often have existing equivalents already in the cluster:
- Keycloak — OIDC / SAML / LDAP provider. Skip if your cluster already federates an IdP.
- Prometheus + Grafana — metrics and dashboards. Skip if your cluster already has a monitoring stack.
- Alertmanager — alert routing, paired with Prometheus.
- cert-manager — TLS certificate lifecycle.
- Argo CI — GitOps delivery.
- Apache SeaTunnel — data integration / EL connectors.
- Apache Flink — stream processing for event-driven pipelines and CDC.
- Open Policy Agent (OPA) — fine-grained policy enforcement.
See Services for the per-service description and upstream links.
How the services connect
The canonical lakehouse read path:
client (BI / notebook / Spark job)
│
│ query
▼
┌──────────────────────────┐
│ Trino / Spark Connect │
└────────────┬─────────────┘
│ Iceberg REST + OIDC token
▼
┌──────────────────────────┐
│ Lakekeeper │
│ (Iceberg REST catalog) │
└────────────┬─────────────┘
│ table metadata + short-lived S3 creds
▼
┌──────────────────────────┐
│ SeaweedFS │
│ (or BYO S3-compatible) │
└────────────┬─────────────┘
│ Parquet files
▼
(read by engine)
Apache Kafka carries audit events and (when ingest pipelines are wired up) source-of-truth events that Airflow batches or Flink processes into Iceberg tables. JupyterHub talks to Trino for SQL and to Spark Connect for heavier compute; Superset talks to Trino for dashboards.
Operating model
stardelt follows the operator-per-service pattern: each service is managed by its own well-maintained upstream operator (Strimzi for Kafka, Spark Operator for Spark, CloudNative-PG for Postgres backing Lakekeeper, etc.). On top sits the stardelt Operator, which reconciles a small set of top-level CRDs into the per-service CRDs underneath. Operators run as standard Kubernetes controllers — no SaaS side, no phone-home.
Multi-tenancy
Multi-tenancy primitives exist in v1 even though MVP deployments are single-tenant. This avoids a rewrite later:
- Every stardelt resource carries a
stardelt.io/tenantlabel. - Audit events carry
tenant_id. - Tenant namespaces can be isolated by Kubernetes NetworkPolicies.
Sovereignty
Full commitments in SOVEREIGNTY.md. Architectural enforcement:
- No mandatory outbound calls. Every release ships a documented network-egress matrix.
- Air-gap install profile. All images at
ghcr.io/stardelt/...plus mirrors;images.tarbundle ships with each release. - No license server. There isn't one. There will never be one.
- Opt-in telemetry only. Off by default.
Compliance
stardelt is evidence-gathering infrastructure, not a certified product. Later phases ship control-mapping starter kits for SOC2, ISO27001, BSI C5, and FedRAMP-on-your-own-cluster. Certifications remain the operator's responsibility.