hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28411: Bucket Map Join on Iceberg tables

Open okumin opened this issue 1 year ago • 3 comments

What changes were proposed in this pull request?

Enable Bucket Map Join using Bucket Transform of Apache Iceberg. This is a part of the Partition-Aware Optimization initiative.

  • https://issues.apache.org/jira/browse/HIVE-28411
  • https://issues.apache.org/jira/browse/HIVE-28410
  • https://docs.google.com/document/d/1srEK3atO2T3Apa-FsF6bW__ECY-nFrev_1RZ8EN4UF8/edit#heading=h.jzie7kdemx93

Why are the changes needed?

We know Bucket Map Join can significantly improve performance in some cases. This PR would unlock the capability to any Open Table Format.

Does this PR introduce any user-facing change?

Yes. Execution plans would change if Iceberg users had bucketed tables. They can disable the optimization with hive.convert.join.bucket.mapjoin.tez=false.

I expect non-Iceberg users not to see any changes.

Is the change a dependency upgrade?

No.

How was this patch tested?

I added multiple tests. iceberg_bucket_map_join_[1-6].q are copied from original qtests of Hive native tables. iceberg_bucket_map_join[7-8].q are new test cases.

The first commit added all test cases and q.out on the latest master branch. It would help you track how execution plans would change with this PR.

Note

This PR doesn't introduce Bucket Map Join using non-bucketing partitions. Also, we don't support the case with partition evolutions yet as you can see it in iceberg_bucket_map_join_4.q.

okumin avatar Aug 26 '24 13:08 okumin

One test failed on the revision = 4206c61.

  • https://ci.hive.apache.org/blue/rest/organizations/jenkins/pipelines/hive-precommit/branches/PR-5409/runs/6/nodes/597/log/?start=0

I think this PR should not change the behavior of ACID, and the problem is not reproduced on my local machine.

[2024-09-02T02:44:41.497Z] [INFO] Running org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez
[2024-09-02T03:26:49.915Z] [ERROR] Tests run: 80, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2,520.126 s <<< FAILURE! - in org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez
[2024-09-02T03:26:49.915Z] [ERROR] org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionWithoutBucketsInsertAndDeleteInsertOverwrite  Time elapsed: 134.711 s  <<< FAILURE!
[2024-09-02T03:26:49.915Z] java.lang.AssertionError: Bucket names are not matching after compaction in the base folder expected:<[bucket_00000, bucket_00001, bucket_00002, bucket_00003, bucket_00004, bucket_00005, bucket_00006, bucket_00007, bucket_00008, bucket_00009, bucket_00010, bucket_00011, bucket_00012, bucket_00013]> but was:<[bucket_00000, bucket_00001, bucket_00002, bucket_00003, bucket_00004, bucket_00005, bucket_00006, bucket_00007, bucket_00008, bucket_00009, bucket_00010, bucket_00011, bucket_00012, bucket_00013, bucket_00014]>
[2024-09-02T03:26:49.915Z] 	at org.junit.Assert.fail(Assert.java:89)
[2024-09-02T03:26:49.915Z] 	at org.junit.Assert.failNotEquals(Assert.java:835)
[2024-09-02T03:26:49.915Z] 	at org.junit.Assert.assertEquals(Assert.java:120)
[2024-09-02T03:26:49.915Z] 	at org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMinorCompactionWithoutBucketsCommon(TestCrudCompactorOnTez.java:1461)
[2024-09-02T03:26:49.915Z] 	at org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez.testMajorCompactionWithoutBucketsInsertAndDeleteInsertOverwrite(TestCrudCompactorOnTez.java:1414)

okumin avatar Sep 02 '24 09:09 okumin

I modified places related to coding styles. I still keep a couple of comments untouched. That's just because I am searching my memory for my original intentions. Just a moment https://github.com/apache/hive/pull/5409/commits/c6aa9d2d0a81ff2e8834e1350a18405e593fe212

okumin avatar Oct 30 '24 16:10 okumin

hey @okumin, thanks for addressing the comments! I think just 3 minor remained + sonar issues, could you please take a look

deniskuzZ avatar Oct 31 '24 11:10 deniskuzZ

@deniskuzZ I think I applied changes for all comments and cosmetic changes suggested by Sonar cloud(such as cognitive complexity is not handled as the issue is likely to exist originally). Thanks for reviewing this big PR

okumin avatar Oct 31 '24 15:10 okumin

Thank you!

okumin avatar Nov 02 '24 04:11 okumin