[#1608][part-8] feat(spark3): add a limit to the number of retries when block access is denied
What changes were proposed in this pull request?
In https://github.com/apache/incubator-uniffle/issues/1617 If the servers in the cluster is unhealthy very frequently, such as high load or high network card traffic or disk utilization triggering thresholds, etc, this may easily trigger the restriction of blockFailSentRetryMaxTimes in reassign. therefore, add a limit to the number of retries when block access is denied.
Why are the changes needed?
Add a limit to the number of retries when block access is denied.
Fix: # (issue)
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Codecov Report
Attention: Patch coverage is 54.54545% with 5 lines in your changes are missing coverage. Please review.
Project coverage is 54.05%. Comparing base (
6f6d35a) to head (618fe20). Report is 21 commits behind head on master.
| Files | Patch % | Lines |
|---|---|---|
| ...va/org/apache/uniffle/common/ShuffleBlockInfo.java | 25.00% | 3 Missing :warning: |
| ...ffle/client/impl/grpc/ShuffleServerGrpcClient.java | 0.00% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1715 +/- ##
============================================
- Coverage 54.86% 54.05% -0.81%
- Complexity 2358 2763 +405
============================================
Files 368 414 +46
Lines 16379 21774 +5395
Branches 1504 2054 +550
============================================
+ Hits 8986 11770 +2784
- Misses 6862 9260 +2398
- Partials 531 744 +213
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Test Results
2 391 files ±0 2 391 suites ±0 4h 57m 58s :stopwatch: +25s 929 tests ±0 928 :white_check_mark: ±0 1 :zzz: ±0 0 :x: ±0 10 763 runs ±0 10 749 :white_check_mark: ±0 14 :zzz: ±0 0 :x: ±0
Results for commit 618fe205. ± Comparison against base commit de4b2611.
This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.
This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.
What do you recommend? Or continue to reuse blockMaxRetryTimes
This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.
What do you recommend? Or continue to reuse blockMaxRetryTimes
After rethinking, I think we don't need this. If you want to increase retry count, you just need to increase the block retry max times.
This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.
What do you recommend? Or continue to reuse blockMaxRetryTimes
After rethinking, I think we don't need this. If you want to increase retry count, you just need to increase the block retry max times.
Ok, i will close it