trino icon indicating copy to clipboard operation
trino copied to clipboard

Let each Iceberg catalog type control new table location empty check

Open weijiii opened this issue 10 months ago • 10 comments

Description

  • Move the logic to check if the new table location is empty to catalog level. The default behavior is the same as before this PR, which each catalog implementation can decide whether to override or not.

Additional context and related issues

When creating a new table, Trino's Iceberg connector expects to

  1. Determine a target location for the new table
  2. Verify that the location is empty
  3. Write the new table metadata to that location.

The purpose of this check is to reassure that no location is shared by more than one tables. This whole operation is performed by Trino engine, which is not applicable to OpenHouse (OH), because for OH the location of a new table is controlled by the table service server. It is not meaningful for Trino to perform this check for OH since OH does not provide API to determine the target location, and even if there is logic to provide a target path, it is not guaranteed to be the same path since the table service is in charge of the table metadata write eventually.

Release notes

(x) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

weijiii avatar Apr 24 '24 00:04 weijiii

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jing-Wei Lu. This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot[bot] avatar Apr 24 '24 00:04 cla-bot[bot]

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

findepi avatar May 16 '24 08:05 findepi

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

@findepi Thanks for taking a look.

I am implementing the write support for OpenHouse. Currently the location for a new table is determined by the remote table service. The remote table service would handle writing the table metadata and then modify the access afterwards so that it is readable (we already have read support). We are not able to follow the other catalog type by appending the namespace location and new table name as the target location since the logic is dictated by the table service so this is subject to changes.

I agree that this behavior is catalog-dependent and that this should be at the catalog level if we were to allow not checking whether the new table location is empty. I will convert this PR to a draft for now

weijiii avatar May 22 '24 22:05 weijiii

@findepi Can you take a look when you get a chance? I added a method in TrinoCatalog for checking the new table location so that each catalog can decide whether to override this check. Please let me know if it makes sense. Thanks.

weijiii avatar May 24 '24 03:05 weijiii

Can you elaborate on the need for this?

My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.

Thanks @findepi for taking a look, Lei from OpenHouse(OH) team here.

Think of OH as a catalog layer that determines the table location on-the-fly, so that from client's (in this case, engine) perspective it handed in table identifier, schema and some other metadata, OpenHouse server will then allocate the table location on the storage and return tableLocation as part of the response. OpenHouse made the decision to manage the allocation of tableLocation as well as authoring metadata.json in Iceberg, so to narrow down the API exposed to end users (as comparing to letting users write metadata.json themselves and ask catalog layer to persist the new location of json file)

The point to be made here is, there's no way for Trino to determine table location before table is actually created, thus this existence check could be abstracted away.

autumnust avatar May 28 '24 21:05 autumnust

@findepi Friendly ping. Can you take a look when you have time? Lei form the OpenHouse project has left a comment for the reasoning as well. Thank you!

weijiii avatar Jun 03 '24 03:06 weijiii

@weijiii thanks for reminding us about this PR. I wonder what @electrum 's thoughts on this are

findepi avatar Jun 03 '24 08:06 findepi

@electrum Can you take a look when you get a chance? Thank you!

weijiii avatar Jun 06 '24 00:06 weijiii

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jun 27 '24 17:06 github-actions[bot]

Hi @bitsondatadev Can you take a look at this change? Thanks

weijiii avatar Jun 27 '24 17:06 weijiii

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Jul 22 '24 17:07 github-actions[bot]

@findinpath @cwsteinbach @electrum .. can you chime in here and see what next steps might be (apart from the necessary rebase by @weijiii )

mosabua avatar Jul 22 '24 17:07 mosabua

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Aug 13 '24 17:08 github-actions[bot]

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

github-actions[bot] avatar Sep 04 '24 17:09 github-actions[bot]

👋 @weijiii please feel free to reopen if you want to continue work on this PR and we can help with finding reviewers.

mosabua avatar Sep 04 '24 21:09 mosabua