trino icon indicating copy to clipboard operation
trino copied to clipboard

Iceberg View Rest Catalog support

Open amogh-jahagirdar opened this issue 1 year ago • 17 comments

Description

This change integrates Iceberg View support https://iceberg.apache.org/view-spec/ in the REST catalog.

Additional context and related issues

See #14120

Release notes

( ) This is not user-visible or is docs only, and no release notes are required. (X ) Release notes are required. Please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

amogh-jahagirdar avatar Nov 19 '23 08:11 amogh-jahagirdar

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Nov 19 '23 08:11 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Nov 28 '23 19:11 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Nov 28 '23 19:11 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Nov 28 '23 19:11 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar Nov 28 '23 20:11 cla-bot[bot]

Ok so there's still some things I need to look into, here's my list:

1.) Dialect resolution will be a key point of discussion https://github.com/trinodb/trino/pull/19818#discussion_r1400212612 -> refer here for the details.

2.) Updating the comment on a column on view should be implemented.

3.) Testing. Right now Trino REST smoke tests use JDBC catalog which doesn't implement ViewCatalog. We pass through some Hadoop HDFS configuration through to the JDBC catalog. For testing views, what I have done currently is a catalog implementation which uses both the JDBC catalog and InMemoryCatalog for views. It's a bit messy, I'll look into options here. Ideally we could just use the InMemoryCatalog but I don't think there's a way to pass in the Hadoop config details we would want.

amogh-jahagirdar avatar Nov 28 '23 20:11 amogh-jahagirdar

1.) Dialect resolution will be a key point of discussion #19818 (comment) -> refer here for the details.

please see https://github.com/trinodb/trino/pull/19818#discussion_r1409256153 for how to unblock this PR.

findepi avatar Nov 29 '23 13:11 findepi

cc @alexjo2144 @findinpath

findepi avatar Nov 29 '23 13:11 findepi

On Hive views, those seem to leverage Coral, which I don't think we want to do here. Instead of an IR layer that can be represented as a separate dialect in the future in Iceberg views

@amogh-jahagirdar Could you please clarify why using Coral does not work? Looks like it achieves points (1) and (2) in your original comment, in addition to fixing the correctness issues in the first iteration of this PR.

I think it makes sense to leverage Coral and use engine's native SQL on the source (input) as the SOT for the logic specification. In fact, on the view creation path, to create an IR, the IR must adhere to the SQL spec of the input/source engine. Therefore it is not clear if IR as a dialect approach is fundamentally different given the path is still input SQL --> IR --> output SQL.

On the flip side, using an IR-as-a-dialect:

  • Does not eliminate translation (on both input and output sides). So it essentially will reimplement translation method.
  • Makes the number of dialects n + 1.
  • Introduces a dialect that is neither natively supported by an engine nor readable by humans.

@findepi @dain Any thoughts?

wmoustafa avatar Feb 03 '24 06:02 wmoustafa

Could you please clarify why using Coral does not work? Looks like it achieves points (1) and (2) in your original comment, in addition to fixing the correctness issues in the first iteration of this PR.

@wmoustafa I don't think the statement implies that Coral does not or cannot work, but we would prefer to keep that consideration out of scope for the initial PR. With respect to IR representations or translation, we kept those ideas in mind with the view specification, but we have serious reservations and want address the primary case where users just want to use Trino with a Trino dialect.

If someone wants to use Iceberg views across Spark/Trino, I think we would prefer they write the view in Trino since it adheres more closely to the standards. As a follow up, we could introduce a mode where Trino can attempt to process other view dialects either natively or using something like Coral, but I would expect that users would need to explicitly opt into that behavior as it would potentially introduce semantic differences.

danielcweeks avatar Feb 03 '24 15:02 danielcweeks

We have talked a few times about this space offline, so I'd like to summarize what I remember:

  • View support in Iceberg with a dialect declaration is a welcomed feature.
  • Trino will definitely support a Trino dialect.
  • Trino will optionall support converting other dialects to Trino via Coral (likely just Hive and Spark).
  • Trino will store views in the Trino syntax.
  • I am supportive of Iceberg defining an simplified Iceberg dialect as long as it has very limited rules, and hightly spedified expeced semantics. I also don't care if the dialect is text or structured, as long as the above holds.
  • I can imagine that if a Trino created view matches an Iceberg dialect we could write as Iceberg instead of Trino.
  • I (and many others) am highly skeptical of other external projects to create a generic IR projects. They tend to be tuned to one specific enginer that doesn't match ANSI SQL or Trino, or require implementing all possible features of all possible SQL engines. So I can't imagine a world where we sould consider supporting something like this.

dain avatar Feb 04 '24 19:02 dain

Could you please clarify why using Coral does not work? Looks like it achieves points (1) and (2) in your original comment, in addition to fixing the correctness issues in the first iteration of this PR.

@wmoustafa Sorry for the confusion, essentially what @danielcweeks said. What I meant was that I think Coral or any IR representations/translations should be kept out of scope for this PR. In the latest implementation, view resolution is strict and looks for a Trino dialect specifically. I think that's a good starting point for Iceberg view integration in Trino.

In a future PR we could consider other options like a non-strict mode (what I had on the first iteration), Coral, or some other IR project. Based on the discussion, there looks to be an interest in using Coral for dialect conversion in Trino which again I think can be done in a follow up PR.

amogh-jahagirdar avatar Feb 06 '24 15:02 amogh-jahagirdar

can we take this further? There are users asking the view support for Nessie catalog aswell. I am waiting for the conclusions on this PR to avoid some back and forth work.

ajantha-bhat avatar Apr 01 '24 13:04 ajantha-bhat

cc: @dain, @findepi, @findinpath, @ebyhr

ajantha-bhat avatar Apr 10 '24 01:04 ajantha-bhat

Hello guys! We are really looking forward to using this feature in a production environment. Do you think it could be merged soon? Have a nice day!

Martomate avatar Apr 29 '24 07:04 Martomate

can we take this further? There are users asking the view support for Nessie catalog aswell. I am waiting for the conclusions on this PR to avoid some back and forth work.

that's a good call. in the meantime, @ajantha-bhat can you please help us with https://github.com/trinodb/trino/issues/20323?

findepi avatar Apr 29 '24 10:04 findepi

My latest update addressed all the comments @electrum , thanks for the review!

amogh-jahagirdar avatar May 21 '24 21:05 amogh-jahagirdar

Thanks!

electrum avatar May 22 '24 22:05 electrum

Dont we need to document this somehow? Also .. need release notes entry.

mosabua avatar May 23 '24 03:05 mosabua

Good point @mosabua we certainly should, I'm happy to raise another PR to document this feature.

amogh-jahagirdar avatar May 23 '24 14:05 amogh-jahagirdar

Great @amogh-jahagirdar - pull me in as reviewer and I can help with merge

mosabua avatar May 24 '24 16:05 mosabua