iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Hive: Add View support for HIVE catalog

Open nk1506 opened this issue 1 year ago • 11 comments

This changes include:

  1. Introduction of common metadata interface(BaseMetadata) for table and view.
  2. Refactor for HiveTableOperation to have common code for table and view commits.

Ref: https://github.com/apache/iceberg/pull/9682 , https://github.com/apache/iceberg/pull/9461 , https://github.com/apache/iceberg/pull/8907

nk1506 avatar Mar 02 '24 18:03 nk1506

@nastra build is failing because of following reason: Error: The log was not found. It may have been deleted based on retention settings.

Ref: https://github.com/apache/iceberg/actions/runs/8229843245/job/22501837474?pr=9852

Any pointer or should I create a bug for the same?

nk1506 avatar Mar 11 '24 12:03 nk1506

@nastra build is failing because of following reason: Error: The log was not found. It may have been deleted based on retention settings.

Ref: https://github.com/apache/iceberg/actions/runs/8229843245/job/22501837474?pr=9852

Any pointer or should I create a bug for the same?

This is because the GH hosted runner for Spark 3.3 runs out of disk space. It's a known issue that @manuzhang attempted to fix in https://github.com/apache/iceberg/pull/9786

nastra avatar Mar 11 '24 14:03 nastra

This PR is looking meshy with all the refactor changes. I have moved the common code for table view with this new PR . This PR can go as only view related changes if the previous is getting merged before.

nk1506 avatar Mar 19 '24 09:03 nk1506

@szehon-ho / @nastra , Please let me know if there are any comments left .

nk1506 avatar Apr 03 '24 12:04 nk1506

Hi @szehon-ho / @pvary / @nastra , Please take a look .

nk1506 avatar Apr 15 '24 14:04 nk1506

@danielcweeks , @pvary , @szehon-ho . Please help here to bring this to closure. It really requires a look from Hive-Experts.

nk1506 avatar Jun 25 '24 14:06 nk1506

hi, @nk1506 thanks for ping, ill look at this, but just FYI, when I look a few times back this patch seems a bit far, it adds things like ViewAwareTableBuilder and TableAwareViewBuilder that dont make much sense, and it will take some time to think if there is a better way.

Also, because of the earlier review feedback from @nastra it seems we abandoned the effort to unify metadata class to re-use code from HiveTableOperations that I and @pvary initially proposed . If this is the direction we chose, its ok , but it will take personally some time for me to double check everything, as we are re-implementing all the commit code.

Another reviewer may also look at the patch of course, I am just sharing my experience reviewing this one.

szehon-ho avatar Jun 25 '24 17:06 szehon-ho

Thanks a lot @szehon-ho .

hi, @nk1506 thanks for ping, ill look at this, but just FYI, when I look a few times back this patch seems a bit far, it adds things like ViewAwareTableBuilder and TableAwareViewBuilder that dont make much sense, and it will take some time to think if there is a better way.

Since other catalogs have followed the same strategy, I kept those changes out of this PR's scope.

Also, because of the earlier review feedback from @nastra it seems we abandoned the effort to unify metadata class to re-use code from HiveTableOperations that I and @pvary initially proposed . If this is the direction we chose, its ok , but it will take personally some time for me to double check everything, as we are re-implementing all the commit code.

As refactoring with common code for table and view this PR became very big. and it was being difficult to review. We definitely want to bring the common part together. I think we should do it with another patch.

Another reviewer may also look at the patch of course, I am just sharing my experience reviewing this one.

@nastra has reviewed throughly on the HiveCatalog and related parts. He proposed to get feedback on the Hive core parts from experts.

nk1506 avatar Jun 25 '24 17:06 nk1506

Hi @nk1506 @danielcweeks Any update for this pull request?

nqvuong1998 avatar Aug 27 '24 08:08 nqvuong1998

Hi @nk1506 @danielcweeks Any update for this pull request?

It still looks like there's a lot of outstanding comments, so I'm not clear if those comments have been addressed. @nk1506 if you still are working on this, please address the comments and mark them resolved, so we can have a better understanding of what's still open. After that, I'm happy to review again.

danielcweeks avatar Aug 27 '24 16:08 danielcweeks

Hi @danielcweeks, Thanks for your feedback. If I recall correctly, @nastra reviewed it and suggested getting additional input from Hive experts. There’s still one open comment related to how we should validate the table/view existence check. Since we plan to use ViewCatalogTests with Hive, the expectation is that the error messages are translated in the same way as other catalogs. To achieve this, the PR includes a helper class.

Please suggest a better approach to address this comment.

nk1506 avatar Aug 27 '24 17:08 nk1506

Hi @danielcweeks , Please let me know your thoughts.

nk1506 avatar Sep 05 '24 16:09 nk1506

From the perspective of a consultant at Dremio, after having talked to several of our customers, this would be a hugely valuable feature for the broader Iceberg adoption. HMS/Glue are still the most common lakehouse catalog choices we see in large enterprises and Hive will likely stay around for a long time for many of them. Having the ability to define and share view definitions across engines without having to migrate off of Hive would further accelerate Iceberg adoption in many of those companies.

mxmarg avatar Sep 05 '24 18:09 mxmarg

@nk1506 A few comments, but other than that this LGTM @nastra did you also want to take another pass?

danielcweeks avatar Sep 12 '24 20:09 danielcweeks

Build is failing because of flaky issue https://github.com/apache/iceberg/issues/11046

nk1506 avatar Sep 13 '24 14:09 nk1506

thanks @nk1506 for your patience here. Also thanks for everyone that helped out with reviews

nastra avatar Sep 13 '24 16:09 nastra

Thanks @nastra and @danielcweeks for your reviews and feedback.

nk1506 avatar Sep 13 '24 16:09 nk1506

Hi @nastra @danielcweeks , Is this pull request scheduled for inclusion in the v1.7 release?

nqvuong1998 avatar Sep 17 '24 07:09 nqvuong1998

@nqvuong1998 yes that will be shipped with 1.7.0

nastra avatar Sep 17 '24 07:09 nastra