trino
trino copied to clipboard
Use BigQuery storage read API when reading external reading BigLake tables
Description
BigQuery storage APIs support reading BigLake external tables (ie external tables with a connection). But the current implementation uses views which can be expensive, because it requires Trino issuing a SQL query against BigQuery. This PR adds support to read BigLake tables directly using the storage API.
There are no behavior changes for external tables and BQ native tables - they use the view and storage APIs respectively. Added a new test for BigLake tables.
Additional context and related issues
Fixes https://github.com/trinodb/trino/issues/21016 https://cloud.google.com/bigquery/docs/biglake-intro
Release notes
(x) Release notes are required, with the following suggested text:
# BigQuery
* Improve performance when reading external BigLake tables. ({issue}`21016`)
/test-with-secrets sha=9d7bd1dcad92de70856b928a99e443ec3d8b4619
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/8261925064
Support for reading BigLake tables using BigQuery storage read API.
Please remove the following dot. https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages
Also, I would change to Use BigQuery storage read API when reading external reading BigLake tables
because the current title looks little misleading. Reading BigLake tables has been supported via query API.
Support for reading BigLake tables using BigQuery storage read API.
Please remove the following dot. https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages
Also, I would change to
Use BigQuery storage read API when reading external reading BigLake tables
because the current title looks little misleading. Reading BigLake tables has been supported via query API.
Done.
@ebyhr Do you have any more feedback or can this be merged?
@ebyhr @hashhar Friendly ping here. We have a GCP customer who is waiting for this PR to be merged.
see https://github.com/trinodb/trino/pull/21017#discussion_r1536270950, I think it's an important question.
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.
Can we get this merged?
@anoopj Do you plan to continue this? Or should someone else pick this up and drive to completion? I see that the newer client is released already.
@hashhar I am not planning to work on this.
@ssheikin and @hashhar .. are you taking this over here or in a new PR? Should we close this one?
@ssheikin and @hashhar .. are you taking this over here or in a new PR? Should we close this one?
please leave it open for now. we are discussing.
@mosabua I could take up the work on this PR
Sounds good @Praveen2112 .. since @anoopj is not going to continue you can continue on this PR or start a new one with his work. Just link to this PR if you create a new one.
Continuation of this PR - https://github.com/trinodb/trino/pull/22974