trino Let each Iceberg catalog type control new table location empty check

Description

Move the logic to check if the new table location is empty to catalog level. The default behavior is the same as before this PR, which each catalog implementation can decide whether to override or not.

Additional context and related issues

When creating a new table, Trino's Iceberg connector expects to

Determine a target location for the new table
Verify that the location is empty
Write the new table metadata to that location.

The purpose of this check is to reassure that no location is shared by more than one tables. This whole operation is performed by Trino engine, which is not applicable to OpenHouse (OH), because for OH the location of a new table is controlled by the table service server. It is not meaningful for Trino to perform this check for OH since OH does not provide API to determine the target location, and even if there is logic to provide a target path, it is not guaranteed to be the same path since the table service is in charge of the table metadata write eventually.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

Apr 24 '24 00:04 weijiii

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jing-Wei Lu. This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email [email protected]
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Apr 24 '24 00:04 cla-bot[bot]

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

May 16 '24 08:05 findepi

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

@findepi Thanks for taking a look.

I am implementing the write support for OpenHouse. Currently the location for a new table is determined by the remote table service. The remote table service would handle writing the table metadata and then modify the access afterwards so that it is readable (we already have read support). We are not able to follow the other catalog type by appending the namespace location and new table name as the target location since the logic is dictated by the table service so this is subject to changes.

I agree that this behavior is catalog-dependent and that this should be at the catalog level if we were to allow not checking whether the new table location is empty. I will convert this PR to a draft for now

May 22 '24 22:05 weijiii

@findepi Can you take a look when you get a chance? I added a method in TrinoCatalog for checking the new table location so that each catalog can decide whether to override this check. Please let me know if it makes sense. Thanks.

May 24 '24 03:05 weijiii

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

Thanks @findepi for taking a look, Lei from OpenHouse(OH) team here.

Think of OH as a catalog layer that determines the table location on-the-fly, so that from client's (in this case, engine) perspective it handed in table identifier, schema and some other metadata, OpenHouse server will then allocate the table location on the storage and return tableLocation as part of the response. OpenHouse made the decision to manage the allocation of tableLocation as well as authoring metadata.json in Iceberg, so to narrow down the API exposed to end users (as comparing to letting users write metadata.json themselves and ask catalog layer to persist the new location of json file)

The point to be made here is, there's no way for Trino to determine table location before table is actually created, thus this existence check could be abstracted away.

May 28 '24 21:05 autumnust

@findepi Friendly ping. Can you take a look when you have time? Lei form the OpenHouse project has left a comment for the reasoning as well. Thank you!

Jun 03 '24 03:06 weijiii

@weijiii thanks for reminding us about this PR. I wonder what @electrum 's thoughts on this are

Jun 03 '24 08:06 findepi

@electrum Can you take a look when you get a chance? Thank you!

Jun 06 '24 00:06 weijiii

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

Jun 27 '24 17:06 github-actions[bot]

Hi @bitsondatadev Can you take a look at this change? Thanks

Jun 27 '24 17:06 weijiii

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

Jul 22 '24 17:07 github-actions[bot]

@findinpath @cwsteinbach @electrum .. can you chime in here and see what next steps might be (apart from the necessary rebase by @weijiii )

Jul 22 '24 17:07 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

Aug 13 '24 17:08 github-actions[bot]

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

Sep 04 '24 17:09 github-actions[bot]

👋 @weijiii please feel free to reopen if you want to continue work on this PR and we can help with finding reviewers.

Sep 04 '24 21:09 mosabua

trino trino copied to clipboard

Let each Iceberg catalog type control new table location empty check

Description

Additional context and related issues

Release notes

trino
trino copied to clipboard