oso icon indicating copy to clipboard operation
oso copied to clipboard

Blog: Our Data Engineering Journey

Open ryscheng opened this issue 1 year ago • 0 comments

What is it?

We have gone through a bunch of phases in our technical architecture, each a reasonable decision in the moment, but we'd quickly transition from each in rapid succession:

  1. Custom indexer running on GitHub actions writing to Postgres using Prisma, then TypeORM (Sept-Oct 2023 / RF3)
  • Quick, dirty, got an MVP working that helped get our foot in the door
  • Not that different in architecture than many other dashboard / web server architectures (e.g. DeFillama, Augur, etc)
  • Pros: very easy to implement, low complexity, cheap to run
  • Cons: scale, etc
  1. Custom indexer running on GitHub actions writing to Timescale (Nov-Dec 2023)
  • Helped us improve the performance a bit, but ultimately fell over when running live queries on the event table
  • We'd only pre-aggregate at indexing time to events_daily_to_artifact/project.
  1. dbt running on BigQuery, copying marts to CloudSQL (Jan 2024)
  • Way better, worked well for medium sized workloads. Expensive.
  • Allowed us to pre-compute arbitrary aggregations/metrics/analytics
  1. dbt running on BigQuery, copying marts to Clickhouse (Jul 2024)
  • CloudSQL eventually had issues at a certain scale. Copy jobs would fail, indices would take forever to build. Without indices, queries were constantly timing out.
  • Clickhouse automagically solved serving infra way better.
  1. dbt running on BigQuery, sqlmesh running on Clickhouse (Aug 2024)
  • Everyone kept swearing that with Clickhouse we wouldn't need a data warehouse anymore.
  • dbt on BigQuery was a non-starter for timeseries metrics, both because of cost and because the pipeline would crash.
  • Got some rolling window metrics to run on sqlmesh at reasonable cost, but without better delete support, sqlmesh performance meant things would take forever.
  1. dbt running on BigQuery, sqlmesh running on Trino/Iceberg lakehouse, copying marts to Clickhouse (Oct 2024)

  2. ???

Anyway, don't need to write this blog post yet, we're still evolving. But at some point I think the HN community would love this.

ryscheng avatar Sep 28 '24 00:09 ryscheng