Replace BigQuery with Lakehouse/Icehouse
What is it?
https://delta.io/
Why? https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
Happy to stick with BigQuery public data sets for now until this becomes a stronger need.
We can also consider Apache Iceberg instead of Delta Lake.
This would be for data upstream from the events table.
We should try to preserve the benefits we get currently from BigQuery:
- Reader-pays public read access
- Some way to do reader-pays batch queries.
Ever since we solved https://github.com/opensource-observer/oso/issues/821
It's an open question now whether we should move more of our datapipeline to sqlmesh + Trino + Iceberg, instead of dbt + BigQuery. This issue can track that work
Rescoping.
Since we are migrating to sqlmesh in #2559, we should consider just using Trino with the BigQuery connector for the query processing.
I might just close this issue since it's a bit outdated at this point. We did it! Just need to migrate models from dbt to sqlmesh now