incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[#1608][part-8] feat(spark3): add a limit to the number of retries when block access is denied

Open dingshun3016 opened this issue 1 year ago • 4 comments

What changes were proposed in this pull request?

In https://github.com/apache/incubator-uniffle/issues/1617 If the servers in the cluster is unhealthy very frequently, such as high load or high network card traffic or disk utilization triggering thresholds, etc, this may easily trigger the restriction of blockFailSentRetryMaxTimes in reassign. therefore, add a limit to the number of retries when block access is denied.

Why are the changes needed?

Add a limit to the number of retries when block access is denied.

Fix: # (issue)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

dingshun3016 avatar May 16 '24 03:05 dingshun3016

Codecov Report

Attention: Patch coverage is 54.54545% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 54.05%. Comparing base (6f6d35a) to head (618fe20). Report is 21 commits behind head on master.

Files Patch % Lines
...va/org/apache/uniffle/common/ShuffleBlockInfo.java 25.00% 3 Missing :warning:
...ffle/client/impl/grpc/ShuffleServerGrpcClient.java 0.00% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1715      +/-   ##
============================================
- Coverage     54.86%   54.05%   -0.81%     
- Complexity     2358     2763     +405     
============================================
  Files           368      414      +46     
  Lines         16379    21774    +5395     
  Branches       1504     2054     +550     
============================================
+ Hits           8986    11770    +2784     
- Misses         6862     9260    +2398     
- Partials        531      744     +213     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar May 16 '24 03:05 codecov-commenter

Test Results

 2 391 files  ±0   2 391 suites  ±0   4h 57m 58s :stopwatch: +25s    929 tests ±0     928 :white_check_mark: ±0   1 :zzz: ±0  0 :x: ±0  10 763 runs  ±0  10 749 :white_check_mark: ±0  14 :zzz: ±0  0 :x: ±0 

Results for commit 618fe205. ± Comparison against base commit de4b2611.

github-actions[bot] avatar May 16 '24 03:05 github-actions[bot]

This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.

zuston avatar May 16 '24 07:05 zuston

This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.

What do you recommend? Or continue to reuse blockMaxRetryTimes

dingshun3016 avatar May 16 '24 08:05 dingshun3016

This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.

What do you recommend? Or continue to reuse blockMaxRetryTimes

After rethinking, I think we don't need this. If you want to increase retry count, you just need to increase the block retry max times.

zuston avatar May 22 '24 09:05 zuston

This looks strange that it replace the partial blockMaxRetryTimes abilities and introduces the extra config option to meet something, but this change is not compatible with the previous logic.

What do you recommend? Or continue to reuse blockMaxRetryTimes

After rethinking, I think we don't need this. If you want to increase retry count, you just need to increase the block retry max times.

Ok, i will close it

dingshun3016 avatar May 22 '24 10:05 dingshun3016