Hive: Add View support for HIVE catalog
This changes include:
- Introduction of common metadata interface(BaseMetadata) for table and view.
- Refactor for HiveTableOperation to have common code for table and view commits.
Ref: https://github.com/apache/iceberg/pull/9682 , https://github.com/apache/iceberg/pull/9461 , https://github.com/apache/iceberg/pull/8907
@nastra build is failing because of following reason:
Error: The log was not found. It may have been deleted based on retention settings.
Ref: https://github.com/apache/iceberg/actions/runs/8229843245/job/22501837474?pr=9852
Any pointer or should I create a bug for the same?
@nastra build is failing because of following reason:
Error: The log was not found. It may have been deleted based on retention settings.Ref: https://github.com/apache/iceberg/actions/runs/8229843245/job/22501837474?pr=9852
Any pointer or should I create a bug for the same?
This is because the GH hosted runner for Spark 3.3 runs out of disk space. It's a known issue that @manuzhang attempted to fix in https://github.com/apache/iceberg/pull/9786
This PR is looking meshy with all the refactor changes. I have moved the common code for table view with this new PR . This PR can go as only view related changes if the previous is getting merged before.
@szehon-ho / @nastra , Please let me know if there are any comments left .
Hi @szehon-ho / @pvary / @nastra , Please take a look .
@danielcweeks , @pvary , @szehon-ho . Please help here to bring this to closure. It really requires a look from Hive-Experts.
hi, @nk1506 thanks for ping, ill look at this, but just FYI, when I look a few times back this patch seems a bit far, it adds things like ViewAwareTableBuilder and TableAwareViewBuilder that dont make much sense, and it will take some time to think if there is a better way.
Also, because of the earlier review feedback from @nastra it seems we abandoned the effort to unify metadata class to re-use code from HiveTableOperations that I and @pvary initially proposed . If this is the direction we chose, its ok , but it will take personally some time for me to double check everything, as we are re-implementing all the commit code.
Another reviewer may also look at the patch of course, I am just sharing my experience reviewing this one.
Thanks a lot @szehon-ho .
hi, @nk1506 thanks for ping, ill look at this, but just FYI, when I look a few times back this patch seems a bit far, it adds things like ViewAwareTableBuilder and TableAwareViewBuilder that dont make much sense, and it will take some time to think if there is a better way.
Since other catalogs have followed the same strategy, I kept those changes out of this PR's scope.
Also, because of the earlier review feedback from @nastra it seems we abandoned the effort to unify metadata class to re-use code from HiveTableOperations that I and @pvary initially proposed . If this is the direction we chose, its ok , but it will take personally some time for me to double check everything, as we are re-implementing all the commit code.
As refactoring with common code for table and view this PR became very big. and it was being difficult to review. We definitely want to bring the common part together. I think we should do it with another patch.
Another reviewer may also look at the patch of course, I am just sharing my experience reviewing this one.
@nastra has reviewed throughly on the HiveCatalog and related parts. He proposed to get feedback on the Hive core parts from experts.
Hi @nk1506 @danielcweeks Any update for this pull request?
Hi @nk1506 @danielcweeks Any update for this pull request?
It still looks like there's a lot of outstanding comments, so I'm not clear if those comments have been addressed. @nk1506 if you still are working on this, please address the comments and mark them resolved, so we can have a better understanding of what's still open. After that, I'm happy to review again.
Hi @danielcweeks, Thanks for your feedback. If I recall correctly, @nastra reviewed it and suggested getting additional input from Hive experts. There’s still one open comment related to how we should validate the table/view existence check. Since we plan to use ViewCatalogTests with Hive, the expectation is that the error messages are translated in the same way as other catalogs. To achieve this, the PR includes a helper class.
Please suggest a better approach to address this comment.
Hi @danielcweeks , Please let me know your thoughts.
From the perspective of a consultant at Dremio, after having talked to several of our customers, this would be a hugely valuable feature for the broader Iceberg adoption. HMS/Glue are still the most common lakehouse catalog choices we see in large enterprises and Hive will likely stay around for a long time for many of them. Having the ability to define and share view definitions across engines without having to migrate off of Hive would further accelerate Iceberg adoption in many of those companies.
@nk1506 A few comments, but other than that this LGTM @nastra did you also want to take another pass?
Build is failing because of flaky issue https://github.com/apache/iceberg/issues/11046
thanks @nk1506 for your patience here. Also thanks for everyone that helped out with reviews
Thanks @nastra and @danielcweeks for your reviews and feedback.
Hi @nastra @danielcweeks , Is this pull request scheduled for inclusion in the v1.7 release?
@nqvuong1998 yes that will be shipped with 1.7.0