incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[FEATURE] Trigger partition split when the shuffle-server is unhealthy due to the insufficient capacity

Open zuston opened this issue 3 months ago • 1 comments

Code of Conduct

Search before asking

  • [x] I have searched in the issues and found no similar issues.

Describe the feature

We found the shuffle-server will slow down the all corresponding spark jobs due the insufficient capacity of localfile store, that also could be caused by the large spark app.

Based on the above oberservation, we should introduce the more aggerative partition split strategy.

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

zuston avatar Dec 01 '25 08:12 zuston

I have implemented in the riffle side. https://github.com/zuston/riffle/pull/532

zuston avatar Dec 01 '25 08:12 zuston