incubator-uniffle
incubator-uniffle copied to clipboard
[FEATURE] Trigger partition split when the shuffle-server is unhealthy due to the insufficient capacity
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Search before asking
- [x] I have searched in the issues and found no similar issues.
Describe the feature
We found the shuffle-server will slow down the all corresponding spark jobs due the insufficient capacity of localfile store, that also could be caused by the large spark app.
Based on the above oberservation, we should introduce the more aggerative partition split strategy.
Motivation
No response
Describe the solution
No response
Additional context
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
I have implemented in the riffle side. https://github.com/zuston/riffle/pull/532