oso
oso copied to clipboard
Blog: Our Data Engineering Journey
What is it?
We have gone through a bunch of phases in our technical architecture, each a reasonable decision in the moment, but we'd quickly transition from each in rapid succession:
- Custom indexer running on GitHub actions writing to Postgres using Prisma, then TypeORM (Sept-Oct 2023 / RF3)
- Quick, dirty, got an MVP working that helped get our foot in the door
- Not that different in architecture than many other dashboard / web server architectures (e.g. DeFillama, Augur, etc)
- Pros: very easy to implement, low complexity, cheap to run
- Cons: scale, etc
- Custom indexer running on GitHub actions writing to Timescale (Nov-Dec 2023)
- Helped us improve the performance a bit, but ultimately fell over when running live queries on the event table
- We'd only pre-aggregate at indexing time to events_daily_to_artifact/project.
- dbt running on BigQuery, copying marts to CloudSQL (Jan 2024)
- Way better, worked well for medium sized workloads. Expensive.
- Allowed us to pre-compute arbitrary aggregations/metrics/analytics
- dbt running on BigQuery, copying marts to Clickhouse (Jul 2024)
- CloudSQL eventually had issues at a certain scale. Copy jobs would fail, indices would take forever to build. Without indices, queries were constantly timing out.
- Clickhouse automagically solved serving infra way better.
- dbt running on BigQuery, sqlmesh running on Clickhouse (Aug 2024)
- Everyone kept swearing that with Clickhouse we wouldn't need a data warehouse anymore.
- dbt on BigQuery was a non-starter for timeseries metrics, both because of cost and because the pipeline would crash.
- Got some rolling window metrics to run on sqlmesh at reasonable cost, but without better delete support, sqlmesh performance meant things would take forever.
-
dbt running on BigQuery, sqlmesh running on Trino/Iceberg lakehouse, copying marts to Clickhouse (Oct 2024)
-
???
Anyway, don't need to write this blog post yet, we're still evolving. But at some point I think the HN community would love this.