trino icon indicating copy to clipboard operation
trino copied to clipboard

Add support for BigLake metastore in Iceberg REST catalog

Open ebyhr opened this issue 9 months ago • 6 comments

Description

Add GOOGLE to iceberg.rest-catalog.security config property for supporting BigLake metastore. Also, this PR adds iceberg.rest-catalog.google-project-id config property.

  • The rate limitation on BigLake metastore
  • Iceberg REST Catalog read requests per minute: 120 -> 360 -> 500 now
  • Iceberg REST Catalog namespace and table write requests per minute: 60 -> 180 -> 300 now
Caused by: org.apache.iceberg.exceptions.RESTException: Unable to process: Quota exceeded for quota metric 'Iceberg REST Catalog read requests' and limit 'Iceberg REST Catalog read requests per minute' of service 'biglake.googleapis.com' for consumer 'project_number:505084745097'.
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:250)
	at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:124)
	at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:108)
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:240)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:336)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:297)
	at org.apache.iceberg.rest.BaseHTTPClient.get(BaseHTTPClient.java:77)
	at org.apache.iceberg.rest.RESTSessionCatalog.loadInternal(RESTSessionCatalog.java:375)

Relates to https://cloud.google.com/bigquery/docs/blms-rest-catalog Supersedes #26054

Release notes

## Iceberg
* Add support for BigLake metastore in Iceberg REST catalog. ({issue}`26219`)

ebyhr avatar Jul 17 '25 01:07 ebyhr

This looks fantastic! Thank you for writing this!

rambleraptor avatar Aug 06 '25 21:08 rambleraptor

Excited to see this get merged

niger-prequel avatar Oct 09 '25 15:10 niger-prequel

I guess BigLake metastore created a new metadata file in the background:

Error:  Errors: 
Error:    TestIcebergBigLakeMetastoreConnectorSmokeTest>BaseIcebergConnectorSmokeTest.testRegisterTableWithComments:268->AbstractTestQueryFramework.assertUpdate:411->AbstractTestQueryFramework.assertUpdate:416 » 
QueryFailed More than one latest metadata file found at location: gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata, latest metadata files are [
gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata/00005-6913e897-0000-2d46-9aaa-14223bce81ce.metadata.json, 
gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata/00005-69218711-0000-2ea5-81a1-c82add7ff230.metadata.json]

https://github.com/trinodb/trino/actions/runs/19693222597/job/56413468126?pr=26219

@talatuyarer Is my understanding correct? Any way to disable the background task?

ebyhr avatar Nov 26 '25 07:11 ebyhr

I guess BigLake metastore created a new metadata file in the background:

Error:  Errors: 
Error:    TestIcebergBigLakeMetastoreConnectorSmokeTest>BaseIcebergConnectorSmokeTest.testRegisterTableWithComments:268->AbstractTestQueryFramework.assertUpdate:411->AbstractTestQueryFramework.assertUpdate:416 » 
QueryFailed More than one latest metadata file found at location: gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata, latest metadata files are [
gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata/00005-6913e897-0000-2d46-9aaa-14223bce81ce.metadata.json, 
gs://trino-ci-test/test_iceberg_biglake_gk2j0xnz67/test_register_table_with_comments_xyo3k42eom/metadata/00005-69218711-0000-2ea5-81a1-c82add7ff230.metadata.json]

https://github.com/trinodb/trino/actions/runs/19693222597/job/56413468126?pr=26219

@talatuyarer Is my understanding correct? Any way to disable the background task?

Hey! So what happened here is that the last ADD COMMENT call returned a 429, but it looks like BigLake metastore leaked the metadata file that was meant to be persisted, and when the retry came from the client side, had two metadata files of version 00005.

So BigLake wasn't creating in the background per-se, and this should only rarely happen when the last update before a RegisterTable hits a retryable error.

In any case, that should be fixed, but IMO shouldn't block submission

Noremac201 avatar Dec 03 '25 23:12 Noremac201

@Noremac201 Thank you! To clarify, does "that should be fixed" refer to Trino or BigLake metastore?

IMO shouldn’t block submission

Unfortunately, we can't merge this PR while we know there are flaky tests. If resolving the root cause will take time, an alternative would be to temporarily disallow the register_table procedure for the BigLake metastore integration.

ebyhr avatar Dec 04 '25 00:12 ebyhr

Ah I see.

I meant it should be fixed on the BigLake side, ideally we're not leaking any metadata files if the transaction doesn't commit on the server side.

Is it possible to just disable that one test -- ..._with_comments? The flow of the test with the ~5 updates in succession is triggering the 429, otherwise I'll leave it to @talatuyarer as to whether to disable register table for the time being.

Noremac201 avatar Dec 04 '25 01:12 Noremac201

Rebased on master without any changes.

ebyhr avatar Dec 12 '25 11:12 ebyhr

is the build failure related?

findepi avatar Dec 12 '25 15:12 findepi

No, it's Databricks CI failure #27642

ebyhr avatar Dec 13 '25 07:12 ebyhr

@wendigo @findepi Could you please review this PR when you have time?

ebyhr avatar Dec 17 '25 04:12 ebyhr