hudi
hudi copied to clipboard
[SUPPORT] could hudi skip shuffle in SortMergeJoin, like what bucketby does in Spark?
As explained in https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/4861715144695760/2994977456373837/5701837197372837/latest.html, the join of two bucketed tables could skip shuffle in a SortMergeJoin.
Is there anything similar in Hudi ? I think it could greatly improve join performance.
I tried bucketIndex, but the join between two tables with bucketIndex still needs shuffle.
cc @boneanxs for taking care of this issue.
@boneanxs @ziudu Created a JIRA - https://issues.apache.org/jira/browse/HUDI-7561