incubator-uniffle [#1750] feat(remote merge): Support Spark.

What changes were proposed in this pull request?

Support spark Framework

Why are the changes needed?

#1750

Does this PR introduce any user-facing change?

No.

How was this patch tested?

unit test, integration test, real job in cluster.

Mar 14 '25 04:03 zhengchenyu

Test Results

3 029 files +18 3 029 suites +18 6h 41m 32s ⏱️ + 1m 40s 1 182 tests + 7 1 180 ✅ + 7 2 💤 ±0 0 ❌ ±0 14 961 runs +62 14 931 ✅ +62 30 💤 ±0 0 ❌ ±0

Results for commit 0c89203b. ± Comparison against base commit 4300f93f.

:recycle: This comment has been updated with latest results.

Mar 14 '25 05:03 github-actions[bot]

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr https://github.com/apache/spark/pull/50248.

Mar 18 '25 07:03 zhengchenyu

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr apache/spark#50248.

cc @LuciferYang

Mar 18 '25 07:03 jerqi

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr apache/spark#50248.

This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.

Mar 18 '25 08:03 jerqi

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr apache/spark#50248.

This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.

Are you talking about changes to Spark? My initial idea was also to see if I could add a new rule. Maybe for map side, I could add new rules. But for reduce, adding a new SortExec is determined by determining whether distribution and partitioning match, which is not easy to do by adding a new Rule. For the draft pr about changes to spark. It is only a draft to verify the feasibility of this proposal. There are still some code architectures that need to be refactored. For example, some partial aggregation in memory logic, add some logic to the rule.

Mar 18 '25 08:03 zhengchenyu

also cc @summaryzb

Mar 18 '25 08:03 LuciferYang

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr apache/spark#50248.

Does this one depend on SPARK-51398 being merged first?

Mar 18 '25 09:03 LuciferYang

From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.

This test is based on draft pr apache/spark#50248.

This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.

Are you talking about changes to Spark? My initial idea was also to see if I could add a new rule. Maybe for map side, I could add new rules. But for reduce, adding a new SortExec is determined by determining whether distribution and partitioning match, which is not easy to do by adding a new Rule. For the draft pr about changes to spark. It is only a draft to verify the feasibility of this proposal. There are still some code architectures that need to be refactored. For example, some partial aggregation in memory logic, add some logic to the rule.

Yes, Meta Cosco ever did similar things. You can see https://github.com/apache/spark/pull/32944 https://github.com/apache/spark/pull/34702

cc @c21 Excuse me, sorry to bother you. Is it possible that we don't change the code of Spark and only add some rules to implement this feature? Could you give us some suggestion?

Mar 18 '25 09:03 jerqi

incubator-uniffle incubator-uniffle copied to clipboard

[#1750] feat(remote merge): Support Spark.

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Test Results

incubator-uniffle
incubator-uniffle copied to clipboard