trino
trino copied to clipboard
Let each Iceberg catalog type control new table location empty check
Description
- Move the logic to check if the new table location is empty to catalog level. The default behavior is the same as before this PR, which each catalog implementation can decide whether to override or not.
Additional context and related issues
When creating a new table, Trino's Iceberg connector expects to
- Determine a target location for the new table
- Verify that the location is empty
- Write the new table metadata to that location.
The purpose of this check is to reassure that no location is shared by more than one tables. This whole operation is performed by Trino engine, which is not applicable to OpenHouse (OH), because for OH the location of a new table is controlled by the table service server. It is not meaningful for Trino to perform this check for OH since OH does not provide API to determine the target location, and even if there is logic to provide a target path, it is not guaranteed to be the same path since the table service is in charge of the table metadata write eventually.
Release notes
(x) This is not user-visible or is docs only, and no release notes are required. ( ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jing-Wei Lu. This is most likely caused by a git client misconfiguration; please make sure to:
- check if your git client is configured with an email to sign commits
git config --list | grep email
- If not, set it up using
git config --global user.email [email protected]
- Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails
Can you elaborate on the need for this?
My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.
Can you elaborate on the need for this?
My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.
@findepi Thanks for taking a look.
I am implementing the write support for OpenHouse. Currently the location for a new table is determined by the remote table service. The remote table service would handle writing the table metadata and then modify the access afterwards so that it is readable (we already have read support). We are not able to follow the other catalog type by appending the namespace location and new table name as the target location since the logic is dictated by the table service so this is subject to changes.
I agree that this behavior is catalog-dependent and that this should be at the catalog level if we were to allow not checking whether the new table location is empty. I will convert this PR to a draft for now
@findepi Can you take a look when you get a chance? I added a method in TrinoCatalog
for checking the new table location so that each catalog can decide whether to override this check. Please let me know if it makes sense. Thanks.
Can you elaborate on the need for this?
My intuition is that if this is catalog-dependent, we don't need a config toggle, the catalog should drive the logic, as needed. But i don't have the full picture yet.
Thanks @findepi for taking a look, Lei from OpenHouse(OH) team here.
Think of OH as a catalog layer that determines the table location on-the-fly, so that from client's (in this case, engine) perspective it handed in table identifier, schema and some other metadata, OpenHouse server will then allocate the table location on the storage and return tableLocation
as part of the response. OpenHouse made the decision to manage the allocation of tableLocation as well as authoring metadata.json
in Iceberg, so to narrow down the API exposed to end users (as comparing to letting users write metadata.json
themselves and ask catalog layer to persist the new location of json file)
The point to be made here is, there's no way for Trino to determine table location before table is actually created, thus this existence check could be abstracted away.
@findepi Friendly ping. Can you take a look when you have time? Lei form the OpenHouse project has left a comment for the reasoning as well. Thank you!
@weijiii thanks for reminding us about this PR. I wonder what @electrum 's thoughts on this are
@electrum Can you take a look when you get a chance? Thank you!
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
Hi @bitsondatadev Can you take a look at this change? Thanks
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
@findinpath @cwsteinbach @electrum .. can you chime in here and see what next steps might be (apart from the necessary rebase by @weijiii )
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.
👋 @weijiii please feel free to reopen if you want to continue work on this PR and we can help with finding reviewers.