trino
trino copied to clipboard
Use BigQuery Read API for reading external BigLake tables
Description
Continuation of the https://github.com/trinodb/trino/pull/21017
BigQuery storage APIs support reading BigLake external tables (ie external tables with a connection). But the current implementation uses views which can be expensive, because it requires Trino issuing a SQL query against BigQuery. This PR adds support to read BigLake tables directly using the storage API.
There are no behavior changes for external tables and BQ native tables - they use the view and storage APIs respectively. Added a new test for BigLake tables.
Additional context and related issues
Fixes https://github.com/trinodb/trino/issues/21016 https://cloud.google.com/bigquery/docs/biglake-intro
Release notes
(x) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:
# BigQuery
* Improve performance when reading external BigLake tables. ({issue}`21016`)
I will squash commits once when I'll make sure that all changes have been properly cherry-picked and rebase with master.
@marcinsbd Can we update the PR description
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
/test-with-secrets sha=5c2fb2886f53965852b06d5f465b7deef089de5e
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/11923270255
I think we should handle the following limitation:
https://cloud.google.com/bigquery/docs/biglake-intro
The BigQuery Storage API is not available in other cloud environments, such as AWS and Azure.
@ebyhr But for Azure or AKS we could use BigLake Omni right or should we use a flag to control them ?
@Praveen2112 I'm not sure how BigLake Omni works in this case. How about adding another condition to BigQueryMetadata#isBigLakeTable? e.g.
externalTableDefinition.getSourceUris().stream().allMatch(uri -> uri.startsWith("gs://")
Hi @anoopj what would be the correct way to check if a TableDefinition describes BigLake Table (external table with connection id) or not? is there a simple way using api to distinguish BigLakeTable from ObjectTable and Omni Table ?
what would be the correct way to check if a TableDefinition describes BigLake Table (external table with connection id) or not? Not sure if I understood the question: but a BigLake table will have the connection ID in the table's ExternalDataConfiguration
is there a simple way using api to distinguish BigLakeTable from ObjectTable and Omni Table ?
You can tell from the dataset region (preferred) or looking at the sourceUris as mentioned in the above comment.
Thanks @ebyhr, @Praveen2112, @anoopj, @krvikash, @pajaks for the review and your help. AC
Please let's do the another round of review @ebyhr, @Praveen2112, @krvikash.
@marcinsbd Can we rebase the PR