trino icon indicating copy to clipboard operation
trino copied to clipboard

Add Iceberg RESTSessionCatalog Implementation

Open danielcweeks opened this issue 3 years ago • 7 comments

Description

This PR adds and the REST Catalog implementation for Iceberg.

Is this change a fix, improvement, new feature, refactoring, or other?

New Feature / Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Connector: Iceberg

How would you describe this change to a non-technical end user or system administrator?

This allows Trino to access any data source that supports the Iceberg REST Catalog spec.

Related issues, pull requests, and links

Documentation

( ) No documentation is needed. ( ) Sufficient documentation is included in this PR. ( ) Documentation PR is available with #prnumber. ( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required. () Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

danielcweeks avatar Jul 21 '22 18:07 danielcweeks

Not having View or MV support is pretty limiting. Is that a restriction from the backend, or can we implement them as follow up?

alexjo2144 avatar Aug 11 '22 19:08 alexjo2144

Not having View or MV support is pretty limiting. Is that a restriction from the backend, or can we implement them as follow up?

We definitely want to add view support as a follow-up. However, the iceberg view spec is still in review, so we should wait until that is finalized.

danielcweeks avatar Aug 11 '22 19:08 danielcweeks

Lett initial comments. Is it possible to add a test for iceberg.rest.token and iceberg.rest.credential?

@ebyhr It may be possible to add tests to validate that the token and credential values are passed through, but would require a lot of additional infrastructure to support actually testing the OAuth2 flows. The logic for all of that is already handled by the underlying Iceberg library and there are extensive tests to validate the flow behaviors.

danielcweeks avatar Aug 16 '22 16:08 danielcweeks

@alexjo2144 and @ebyhr I think we've addressed all the comments and (thanks to a few upstream updates), the tests updates as well. Please take a look when you have a chance.

danielcweeks avatar Aug 19 '22 16:08 danielcweeks

Thanks @alexjo2144, if there aren't any more major comments, we'd really like to get this in now that it's a rather large PR. Happy to keep iterating with smaller PRs as follow ups. Let me know if there's anything we can do to expedite this.

danielcweeks avatar Sep 12 '22 20:09 danielcweeks

No problem. It does look like there are some merge conflicts to resolve when you get a chance @danielcweeks.

And if you could create two GH issues for the follow ups. I think one about setting a namespace location, and another one for views/materialized views?

I can't merge this though, as I am not a maintainer. @ebyhr @findepi could one of you take a look?

alexjo2144 avatar Sep 12 '22 21:09 alexjo2144

I've rebased the PR and also opened https://github.com/trinodb/trino/issues/14119 & https://github.com/trinodb/trino/issues/14120

nastra avatar Sep 13 '22 18:09 nastra

I think that it will be good to add some explanation about how the REST Catalog works. 3-4 sentences would help people very much with adopting this. General explanations of the implications of using this would also be a good addition (e.g. does the usage of this enables us to deploy Trino with Iceberg connector without Hive Metastore?).

Otherwise, seems awesome

jebnix avatar Nov 24 '22 20:11 jebnix

@jebnix I think the current docs are OK. @danielcweeks correct me if I'm wrong but there is no need for Hive Metastore when configuring Iceberg with the REST Catalog, but it is required to use some Catalog behind the scenes (e.g. Nessie / JDBC)

code-magician323 avatar Nov 25 '22 13:11 code-magician323

@jebnix I think the current docs are OK. @danielcweeks correct me if I'm wrong but there is no need for Hive Metastore when configuring Iceberg with the REST Catalog, but it is required to use some Catalog behind the scenes (e.g. Nessie / JDBC)

That's correct. There's no dependency on Hive, but you need a server side implementation of the REST spec, which can just be a proxied version of one of the existing Iceberg catalog implementations.

danielcweeks avatar Nov 27 '22 18:11 danielcweeks

@electrum I've updated to add the configuration for session info which defaults to NONE and there is a PR to clarify the REST behavior. Please take another look when you can.

danielcweeks avatar Nov 27 '22 18:11 danielcweeks

@danielcweeks @nastra Do you think this can get merged soon? Trino still cannot use Iceberg without a Hive Metastore. That's the most important feature of the Iceberg Connector by far.

jebnix avatar Dec 15 '22 18:12 jebnix

I believe nothing is left to be done from our side, it is mainly waiting for a final approval

nastra avatar Dec 15 '22 18:12 nastra

@bitsondatadev Why doesn't this get merged?

code-magician323 avatar Dec 16 '22 14:12 code-magician323

@bitsondatadev Why doesn't this get merged?

I'm gonna tag in @electrum and @findepi here. I'm seeing activity in general start to slow down for development as we approach the holidays so I wouldn't hold your breathe for too much happening until after new years.

Just like Santa 🎅 I'm making a list but my list contains PRs I'll be checking twice when I get back from break.

bitsondatadev avatar Dec 20 '22 15:12 bitsondatadev

@bitsondatadev Why doesn't this get merged?

I'm gonna tag in @electrum and @findepi here. I'm seeing activity in general start to slow down for development as we approach the holidays so I wouldn't hold your breathe for too much happening until after new years.

Just like Santa 🎅 I'm making a list but my list contains PRs I'll be checking twice when I get back from break.

I stand corrected! Happy holidays!

bitsondatadev avatar Dec 23 '22 00:12 bitsondatadev