incubator-uniffle
incubator-uniffle copied to clipboard
[Bug] Incorrect stage retry condition for fetch failure
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Search before asking
- [X] I have searched in the issues and found no similar issues.
Describe the bug
In current codebase, the retry will happen when the partitionId's failure reaches the spark.task.maxFailures. At the case of no-AQE, this is right. But for AQE, this suppose is wrong.
Affects Version(s)
master
Uniffle Server Log Output
No response
Uniffle Engine Log Output
No response
Uniffle Server Configurations
No response
Uniffle Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Could you help check this?