incubator-uniffle
incubator-uniffle copied to clipboard
[#1750] feat(remote merge): Support Spark.
What changes were proposed in this pull request?
Support spark Framework
Why are the changes needed?
#1750
Does this PR introduce any user-facing change?
No.
How was this patch tested?
unit test, integration test, real job in cluster.
Test Results
3 029 files +18 3 029 suites +18 6h 41m 32s ⏱️ + 1m 40s 1 182 tests + 7 1 180 ✅ + 7 2 💤 ±0 0 ❌ ±0 14 961 runs +62 14 931 ✅ +62 30 💤 ±0 0 ❌ ±0
Results for commit 0c89203b. ± Comparison against base commit 4300f93f.
:recycle: This comment has been updated with latest results.
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr https://github.com/apache/spark/pull/50248.
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr apache/spark#50248.
cc @LuciferYang
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr apache/spark#50248.
This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr apache/spark#50248.
This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.
Are you talking about changes to Spark? My initial idea was also to see if I could add a new rule. Maybe for map side, I could add new rules. But for reduce, adding a new SortExec is determined by determining whether distribution and partitioning match, which is not easy to do by adding a new Rule. For the draft pr about changes to spark. It is only a draft to verify the feasibility of this proposal. There are still some code architectures that need to be refactored. For example, some partial aggregation in memory logic, add some logic to the rule.
also cc @summaryzb
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr apache/spark#50248.
Does this one depend on SPARK-51398 being merged first?
From my sight, this feature now can't be used in Spark SQL. Maybe RDD could use this.
This test is based on draft pr apache/spark#50248.
This will break the code implement of Spark. You would better to insert a new logic plan represents the distribution and partitioning after shuffling. You only need to implement some optimization rules.
Are you talking about changes to Spark? My initial idea was also to see if I could add a new rule. Maybe for map side, I could add new rules. But for reduce, adding a new SortExec is determined by determining whether distribution and partitioning match, which is not easy to do by adding a new Rule. For the draft pr about changes to spark. It is only a draft to verify the feasibility of this proposal. There are still some code architectures that need to be refactored. For example, some partial aggregation in memory logic, add some logic to the rule.
Yes, Meta Cosco ever did similar things. You can see https://github.com/apache/spark/pull/32944 https://github.com/apache/spark/pull/34702
cc @c21 Excuse me, sorry to bother you. Is it possible that we don't change the code of Spark and only add some rules to implement this feature? Could you give us some suggestion?