hudi [SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark?

[SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark?

Open ziudu opened this issue 1 year ago • 1 comments

As explained in https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4861715144695760/2994977456373837/5701837197372837/latest.html, the join of two bucketed tables could skip shuffle in a SortMergeJoin.

Is there anything similar in Hudi ? I think it could greatly improve join performance.

I tried bucketIndex, but the join between two tables with bucketIndex still needs shuffle.

Feb 19 '24 12:02 ziudu

cc @boneanxs for taking care of this issue.

Feb 20 '24 01:02 danny0405

@boneanxs @ziudu Created a JIRA - https://issues.apache.org/jira/browse/HUDI-7561

Apr 01 '24 13:04 ad1happy2go

hudi hudi copied to clipboard

[SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark?

hudi
hudi copied to clipboard