stardelt Roadmap
stardelt is a multi-year vision. Each phase produces something demoable and usable on its own — not "half-built monolith until v1.0."
The MVP (Phase 0 + Phase 1) is the only milestone needed to validate the bet. Phases 2–6 are gated on community pull.
Phase 0 — Now — Research, design, branding
Goal: make the case strong enough to recruit collaborators.
Deliverables (this repository):
- Design spec
- Architecture
- Services with license verification
- License policy — accepted, exception, rejected
- Sovereignty commitments
- Roadmap — this document
- README — marketing-ready
- Brand: logo, wordmark
No code yet. This is intentional. We get the design right, recruit the team, then build.
Phase 1 — Lakehouse MVP
Goal: kubectl apply -f lakehouse.yaml → 15 minutes later, you're querying an Iceberg table in Trino through stardelt Nova.
This is the wedge. A single working pillar with the composed control plane proves the differentiator over Stackable.
Scope:
- L1 (substrate, minimum viable): Keycloak, K8s Secrets (OpenBao deferred to Phase 5), VictoriaMetrics + VictoriaLogs + Perses + Jaeger, OpenCost
- L2 (data foundation): SeaweedFS (+ BYO-S3 path), Lakekeeper, Apache Iceberg, OPA + OpenFGA bootstrap, OpenLineage emit-only
- L3 Pillar 1 (Lakehouse SQL only): Trino, DuckDB, Apache Superset, JupyterHub
- L4 (stardelt-native):
stardelt-platform-operator(onlyPlatformInstance,Tenant,LakehouseCRDs), basicstardelt-secret-operator, basicstardelt-listener-operator, Nova MVP (SSO landing + catalog browser + simple cost view + audit search),stardeltCLI
Validation:
- Single-cluster, single-tenant.
kind/k3ddemo for laptops; documented install path for real clusters.- First 5 pilot users.
Phase 2 — Batch ETL pillar
Adds:
- Apache Spark via Spark Operator
- Apache Airflow + Argo Workflows
- dbt-core + SQLMesh
- Apache SeaTunnel
PipelineCRD- Nova: pipelines view, DAG visualization, run history
Phase 3 — Streaming pillar
Adds:
- Apache Kafka via Strimzi
- Apicurio Registry
- Apache Flink
- RisingWave
- Debezium (CDC)
StreamAppCRD- Nova: streams view, topic browser, materialized-view explorer
Phase 4 — ML / AI pillar
Adds:
- Ray via KubeRay
- Kubeflow Pipelines
- MLflow
- Feast (feature store)
- KServe + vLLM (model + LLM serving)
- Qdrant (vector DB)
- Envoy AI Gateway (LLM gateway)
MLWorkspaceCRD- Nova: ML workspace view, model registry, serving endpoints, vector-DB browser
Phase 5 — Production hardening
This is where stardelt earns the "battle-tested" claim.
Adds:
- OpenBao replaces the K8s Secrets path
- Multi-AZ HA, documented RTO ≤ 1h / RPO ≤ 15min
- Air-gap install profile (Harbor mirror bundle,
images.tar) - Sovereign-cloud CI matrix: STACKIT, OVHcloud, IONOS, Hetzner, Open Telekom Cloud, Scaleway alongside EKS / GKE / AKS / OpenShift
- Audit retention into Iceberg (long-term, queryable via Trino)
- Compliance starter kits: SOC2, ISO27001, BSI C5, FedRAMP-on-your-own-cluster control mappings
- Performance track — Apache Gluten + Velox for Spark/Trino native vectorized execution (the OSS answer to proprietary vectorized engines)
- First 10 documented production references
Phase 6 — Hosted stardelt (optional, commercial)
Only happens if community pull justifies it. The multi-tenancy primitives from Phase 1 already exist (Tenant CRD, OpenFGA realms per tenant, cost attribution by tenant), so the work is operational, not architectural:
- Managed control plane
- Self-service onboarding
- Billing
- Support tier
The hosted version remains optional. Self-hosting will always be the canonical path.
What we are not doing in v1
- No fully managed SaaS (Phase 6 only, optional).
- No proprietary query engine. Trino + DuckDB are good enough for ~95% of real workloads.
- No OLTP / Postgres replacement.
- No replacement for the underlying Kubernetes cluster. Bring your own.
- No "we beat the proprietary incumbents on TPC-DS" claims. The pitch is predictable cost, sovereignty, openness — not raw speed. Performance work in Phase 5.
- No multi-cluster federation in v1. Phase 5+ at earliest.
- No edge / small-footprint mode in v1. Interesting future; out of scope today.
How decisions are made
Until governance is formalized (target: end of Phase 1), decisions are made by consensus of active maintainers, anchored in:
- This roadmap and the design spec.
- The license policy in LICENSES.md.
- The sovereignty commitments in SOVEREIGNTY.md.
Roadmap changes are made via pull request to this document. Significant scope changes require an issue and a discussion period.