hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28798: Bucket Map Join partially using partition transforms

Open okumin opened this issue 10 months ago • 2 comments

What changes were proposed in this pull request?

This PR updates OpTraitsRulesProcFactory.SelectRule to propagate bucketing information when the source is an Iceberg table, and only a subset of bucketing columns is used.

https://issues.apache.org/jira/browse/HIVE-28798

Why are the changes needed?

For better performance. Iceberg's transform spec allows us to bucketize multiple columns separately, e.g., stored in /warehouse/db/table/data/key1=3/key2=5. Hive's one encodes a set of all bucketing columns into a single integer.

Does this PR introduce any user-facing change?

No. The query plan can change, but BMJ on Iceberg has not been released yet.

Is the change a dependency upgrade?

No

How was this patch tested?

I updated iceberg_bucket_map_join_8.q so that it includes various combinations.

okumin avatar Mar 03 '25 17:03 okumin

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Jun 05 '25 00:06 github-actions[bot]

I have checked that the patch enables BMJ with a subset of bucket columns, and the patches looks good to me. I left a few minor comments.

ngsg avatar Jul 02 '25 04:07 ngsg

I rebased this branch since it is too obsolete. Now, CI is green.

okumin avatar Jul 14 '25 11:07 okumin

@ngsg Thanks for your thorough review!

okumin avatar Jul 18 '25 12:07 okumin