oso icon indicating copy to clipboard operation
oso copied to clipboard

Replace BigQuery with Lakehouse/Icehouse

Open ryscheng opened this issue 1 year ago • 2 comments

What is it?

https://delta.io/

Why? https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

Happy to stick with BigQuery public data sets for now until this becomes a stronger need.

ryscheng avatar Apr 09 '24 04:04 ryscheng

We can also consider Apache Iceberg instead of Delta Lake.

This would be for data upstream from the events table.

We should try to preserve the benefits we get currently from BigQuery:

  • Reader-pays public read access
  • Some way to do reader-pays batch queries.

ryscheng avatar Aug 21 '24 01:08 ryscheng

Ever since we solved https://github.com/opensource-observer/oso/issues/821

It's an open question now whether we should move more of our datapipeline to sqlmesh + Trino + Iceberg, instead of dbt + BigQuery. This issue can track that work

ryscheng avatar Oct 02 '24 21:10 ryscheng

Rescoping.

Since we are migrating to sqlmesh in #2559, we should consider just using Trino with the BigQuery connector for the query processing.

ryscheng avatar Dec 05 '24 18:12 ryscheng

I might just close this issue since it's a bit outdated at this point. We did it! Just need to migrate models from dbt to sqlmesh now

ryscheng avatar Jan 10 '25 15:01 ryscheng