You don't need Databricks to have a functional data platform. For small teams, simplicity beats sophistication every time.
The stack
- Ingestion: Airbyte or Fivetran for SaaS sources, custom Python for everything else
- Storage: S3 + Iceberg for the lakehouse, Postgres for operational data
- Transformation: dbt Core, running on a schedule via GitHub Actions
- Orchestration: Dagster or Prefect (hosted on a single EC2 instance)
- BI: Metabase or Lightdash (open-source, self-hosted)
What you're giving up
This stack won't give you real-time streaming, automatic scaling, or a unified metadata catalog. But for teams of 2-10 engineers, it's more than enough.
The one rule
Every table must have an owner and a freshness SLA documented in its dbt config. Without this, your platform accumulates zombie tables that nobody trusts.
A data platform is only as good as the trust your team has in its outputs. Start simple, earn trust, then scale.