HIVE-28798: Bucket Map Join partially using partition transforms
What changes were proposed in this pull request?
This PR updates OpTraitsRulesProcFactory.SelectRule to propagate bucketing information when the source is an Iceberg table, and only a subset of bucketing columns is used.
https://issues.apache.org/jira/browse/HIVE-28798
Why are the changes needed?
For better performance. Iceberg's transform spec allows us to bucketize multiple columns separately, e.g., stored in /warehouse/db/table/data/key1=3/key2=5. Hive's one encodes a set of all bucketing columns into a single integer.
Does this PR introduce any user-facing change?
No. The query plan can change, but BMJ on Iceberg has not been released yet.
Is the change a dependency upgrade?
No
How was this patch tested?
I updated iceberg_bucket_map_join_8.q so that it includes various combinations.
Quality Gate passed
Issues
1 New issue
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.
I have checked that the patch enables BMJ with a subset of bucket columns, and the patches looks good to me. I left a few minor comments.
Quality Gate passed
Issues
2 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
I rebased this branch since it is too obsolete. Now, CI is green.
@ngsg Thanks for your thorough review!