[#1020] feat(spark): Support hybrid shuffle manager
What changes were proposed in this pull request?
Delegation shuffle manager has been supported in uniffle, which is useful for avoiding instability of uniffle cluster.
But this is not enough, we hope the hybrid shuffle manager could be supported, that will support using ESS or Uniffle on different stages in one App
Why are the changes needed?
- Add HybridShuffleManager.
- Add UnitTests.
Fix: https://github.com/apache/incubator-uniffle/issues/1020
Does this PR introduce any user-facing change?
(Please list the user-facing changes introduced by your change, including
- Change in user-facing APIs.
- Addition or removal of property keys.)
No.
How was this patch tested?
(Please test your changes, and provide instructions on how to test it:
- If you add a feature or fix a bug, add a test to cover your changes.
- If you fix a flaky test, repeat it for many times to prove it works.)
try to add feature: https://github.com/apache/incubator-uniffle/issues/1020
Hi, @zuston , @jerqi , I have a naive question. How to decide on the shuffle strategy (ESS/RSS/Uniffle) of each shuffleId for spark driver/executor? When to change RSS to ESS is our best practise?
You can see the DelegationShuffleManager design, the pass or not depends on the load or other access checker of coordinator determines
Hi, @zuston , I created a DRAFT PR for this issue. In HybridShuffleManager, I created ESS/RSS ShuffleManager for both, Each time would be decided by Coordinator's response to decide if we should use ESS/RSS.
Am I resolving this issue in right way you think?
Codecov Report
Merging #1136 (0e6bc51) into master (931d6cd) will increase coverage by
0.87%. The diff coverage is51.54%.
@@ Coverage Diff @@
## master #1136 +/- ##
============================================
+ Coverage 54.12% 54.99% +0.87%
- Complexity 2516 2692 +176
============================================
Files 383 378 -5
Lines 21618 20988 -630
Branches 1841 1994 +153
============================================
- Hits 11700 11543 -157
+ Misses 9186 8739 -447
+ Partials 732 706 -26
| Files | Coverage Δ | |
|---|---|---|
| .../apache/spark/shuffle/HybridRssShuffleManager.java | 65.87% <65.87%> (ø) |
|
| ...org/apache/spark/shuffle/HybridShuffleManager.java | 40.60% <40.60%> (ø) |
... and 35 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
A use of rss or ess needs to be bound to a shuffle, which is determined by registerShuffle .
Feel free to discuss if you want to know more about this. Thanks @pegasas
A use of rss or ess needs to be bound to a shuffle, which is determined by
registerShuffle.Feel free to discuss if you want to know more about this. Thanks @pegasas
Thanks @zuston !
I guess if Shufflemaster failed, it will reassign ESS/RSS to an shufflehandle So (ShuffleId, ESS/RSS) should be saved in memory?
Is my assumption right?
A use of rss or ess needs to be bound to a shuffle, which is determined by
registerShuffle. Feel free to discuss if you want to know more about this. Thanks @pegasasThanks @zuston !
I guess if Shufflemaster failed, it will reassign ESS/RSS to an shufflehandle So (ShuffleId, ESS/RSS) should be saved in memory?
Is my assumption right?
Yes.
Is this ready for review ? @pegasas
Is this ready for review ? @pegasas
I am writing an integration test of this case.
Besides,
What's our roadmap of HybridShuffleManager and DelegateShuffleManager? Will HybridShuffleManager will replace delegateShuffleManager eventually?
What's our roadmap of HybridShuffleManager and DelegateShuffleManager? Will HybridShuffleManager will replace delegateShuffleManager eventually?
I think yes. But in current stage, let's reserve it.