HIVE-28411: Bucket Map Join on Iceberg tables
What changes were proposed in this pull request?
Enable Bucket Map Join using Bucket Transform of Apache Iceberg. This is a part of the Partition-Aware Optimization initiative.
- https://issues.apache.org/jira/browse/HIVE-28411
- https://issues.apache.org/jira/browse/HIVE-28410
- https://docs.google.com/document/d/1srEK3atO2T3Apa-FsF6bW__ECY-nFrev_1RZ8EN4UF8/edit#heading=h.jzie7kdemx93
Why are the changes needed?
We know Bucket Map Join can significantly improve performance in some cases. This PR would unlock the capability to any Open Table Format.
Does this PR introduce any user-facing change?
Yes. Execution plans would change if Iceberg users had bucketed tables. They can disable the optimization with hive.convert.join.bucket.mapjoin.tez=false.
I expect non-Iceberg users not to see any changes.
Is the change a dependency upgrade?
No.
How was this patch tested?
I added multiple tests. iceberg_bucket_map_join_[1-6].q are copied from original qtests of Hive native tables. iceberg_bucket_map_join[7-8].q are new test cases.
The first commit added all test cases and q.out on the latest master branch. It would help you track how execution plans would change with this PR.
Note
This PR doesn't introduce Bucket Map Join using non-bucketing partitions. Also, we don't support the case with partition evolutions yet as you can see it in iceberg_bucket_map_join_4.q.
One test failed on the revision = 4206c61.
- https://ci.hive.apache.org/blue/rest/organizations/jenkins/pipelines/hive-precommit/branches/PR-5409/runs/6/nodes/597/log/?start=0
I think this PR should not change the behavior of ACID, and the problem is not reproduced on my local machine.
[2024-09-02T02:44:41.497Z] [INFO] Running org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez
[2024-09-02T03:26:49.915Z] [ERROR] Tests run: 80, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2,520.126 s <<< FAILURE! - in org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez
[2024-09-02T03:26:49.915Z] [ERROR] org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionWithoutBucketsInsertAndDeleteInsertOverwrite Time elapsed: 134.711 s <<< FAILURE!
[2024-09-02T03:26:49.915Z] java.lang.AssertionError: Bucket names are not matching after compaction in the base folder expected:<[bucket_00000, bucket_00001, bucket_00002, bucket_00003, bucket_00004, bucket_00005, bucket_00006, bucket_00007, bucket_00008, bucket_00009, bucket_00010, bucket_00011, bucket_00012, bucket_00013]> but was:<[bucket_00000, bucket_00001, bucket_00002, bucket_00003, bucket_00004, bucket_00005, bucket_00006, bucket_00007, bucket_00008, bucket_00009, bucket_00010, bucket_00011, bucket_00012, bucket_00013, bucket_00014]>
[2024-09-02T03:26:49.915Z] at org.junit.Assert.fail(Assert.java:89)
[2024-09-02T03:26:49.915Z] at org.junit.Assert.failNotEquals(Assert.java:835)
[2024-09-02T03:26:49.915Z] at org.junit.Assert.assertEquals(Assert.java:120)
[2024-09-02T03:26:49.915Z] at org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMinorCompactionWithoutBucketsCommon(TestCrudCompactorOnTez.java:1461)
[2024-09-02T03:26:49.915Z] at org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionWithoutBucketsInsertAndDeleteInsertOverwrite(TestCrudCompactorOnTez.java:1414)
Quality Gate passed
Issues
6 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
I modified places related to coding styles. I still keep a couple of comments untouched. That's just because I am searching my memory for my original intentions. Just a moment https://github.com/apache/hive/pull/5409/commits/c6aa9d2d0a81ff2e8834e1350a18405e593fe212
hey @okumin, thanks for addressing the comments! I think just 3 minor remained + sonar issues, could you please take a look
@deniskuzZ I think I applied changes for all comments and cosmetic changes suggested by Sonar cloud(such as cognitive complexity is not handled as the issue is likely to exist originally). Thanks for reviewing this big PR
Quality Gate passed
Issues
4 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
Thank you!