oso icon indicating copy to clipboard operation
oso copied to clipboard

Trino for distributed queries

Open ryscheng opened this issue 1 year ago • 4 comments

What is it?

In the future, we may have data spread out among a bunch of places (e.g. BigQuery, Clickhouse, Postgres, random files, IPFS). Trino seems like an interesting option for running distributed queries https://trino.io/

ryscheng avatar Feb 06 '24 17:02 ryscheng

Looking at the docs, this is pretty interesting, you can setup data connectors to run

  • Queries over BigQuery storage API
  • Forwarding queries to a Clickhouse or Snowflake instance

Then join it all together in a unified interface. Will be useful if our data is actually across a bunch of locations

ryscheng avatar Apr 02 '24 04:04 ryscheng

In case it helps, Starbust is probably the best managed offering!

davidgasquez avatar Apr 02 '24 07:04 davidgasquez

Apparently you can run Trino on GCP DataProc! that surprised me https://cloud.google.com/dataproc/docs/tutorials/trino-dataproc

ryscheng avatar Apr 05 '24 02:04 ryscheng

For reference, dbt-trino is useful if we want to replace BQ in our data pipeline https://github.com/starburstdata/dbt-trino

ryscheng avatar Apr 30 '24 17:04 ryscheng