incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[FEATURE] Support hybrid shuffle manager

Open zuston opened this issue 2 years ago • 8 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the feature

Delegation shuffle manager has been supported in uniffle, which is useful for avoiding instability of uniffle cluster.

But this is not enough, we hope the hybrid shuffle manager could be supported, that will support using ESS or Uniffle on different stages in one App

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

zuston avatar Jul 18 '23 02:07 zuston

cc @jerqi @advancedxy

zuston avatar Jul 18 '23 02:07 zuston

The biggest problem is exchange reuse and stage recompute. How to handle them?

jerqi avatar Jul 18 '23 03:07 jerqi

The biggest problem is exchange reuse and stage recompute. How to handle them?

Sorry, I don't get your point. BTW, I tested the exchange reuse and it works well when using HybridShuffleManager.

From my prospective, for one shuffleId, it only will use one type of ess/rss, so it always will OK. Right? If I'm wrong, please point out. Thanks

zuston avatar Jul 18 '23 09:07 zuston

The biggest problem is exchange reuse and stage recompute. How to handle them?

Sorry, I don't get your point. BTW, I tested the exchange reuse and it works well when using HybridShuffleManager.

From my prospective, for one shuffleId, it only will use one type of ess/rss, so it always will OK. Right? If I'm wrong, please point out. Thanks

If one shuffle is bind to ess or rss, it's ok. I'm worried that one shuffle change from rss to ess.

jerqi avatar Jul 18 '23 10:07 jerqi

cc @jerqi @advancedxy

I think it's nice to have this feature. It might also be related with stage resubmission.

advancedxy avatar Jul 18 '23 11:07 advancedxy

Hi, community,

I would like to try this issue. Would you like to assign this task to me?

It seems I should add HybridShuffleManager with some tests and integration tests like DelegationRssShuffleManager/RssShuffleManager, which is an opportunity for me to go deeper into uniffle.

pegasas avatar Aug 08 '23 18:08 pegasas

Hi, @zuston , @jerqi ,

I have a naive question. How to decide on the shuffle strategy (ESS/RSS/Uniffle) of each shuffleId for spark driver/executor? When to change RSS to ESS is our best practise?

pegasas avatar Aug 11 '23 01:08 pegasas

Hi, @zuston , @jerqi ,

I have a naive question. How to decide on the shuffle strategy (ESS/RSS/Uniffle) of each shuffleId for spark driver/executor? When to change RSS to ESS is our best practise?

You can see the DelegationShuffleManager design, the pass or not depends on the load or other access checker of coordinator determines

zuston avatar Aug 11 '23 01:08 zuston