[#731][FOLLOWUP] feat(Spark): Configure blockIdLayout for Spark based on max partitions
What changes were proposed in this pull request?
The configuration of the block id layout for Spark2 and Spark3 can be simplified by only providing the maximal number of partitions.
Why are the changes needed?
Currently, optimally configuring the block id layout for Spark is quite complex: https://github.com/apache/incubator-uniffle/pull/1528/files#diff-09ce7eaa98815d62ca00b2a8b0a45b0a922b047014c1f91dc17081b3fef8e7a8R106-R112
Three values have to be provided: the number of bits for sequence number, partition id and task attempt id. The task attempt id has to provide more bits than the partition id, in the sense that the maximal number of attempts can be stored. This requires to also account for speculative execution.
The RssShuffleManager can do the computation and derive the optimal block id layout configuration from the maximal number of partitions only.
Does this PR introduce any user-facing change?
Adds optional configuration spark.rss.blockId.maxPartitions.
How was this patch tested?
Unit and integration tests.
Test Results
2 326 files ± 0 2 326 suites ±0 4h 29m 14s :stopwatch: -12s 902 tests + 51 900 :white_check_mark: + 51 2 :zzz: ±0 0 :x: ±0 10 477 runs +459 10 454 :white_check_mark: +459 23 :zzz: ±0 0 :x: ±0
Results for commit 1aa1172d. ± Comparison against base commit d994b272.
:recycle: This comment has been updated with latest results.
I will review in this weekend.
Codecov Report
Attention: Patch coverage is 89.32039% with 11 lines in your changes are missing coverage. Please review.
Project coverage is 54.83%. Comparing base (
d994b27) to head (1aa1172).
| Files | Patch % | Lines |
|---|---|---|
| ...pache/uniffle/server/ShuffleServerGrpcService.java | 0.00% | 10 Missing :warning: |
| ...uniffle/shuffle/manager/RssShuffleManagerBase.java | 98.87% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1566 +/- ##
============================================
+ Coverage 53.72% 54.83% +1.11%
- Complexity 2838 2855 +17
============================================
Files 437 417 -20
Lines 24663 22387 -2276
Branches 2094 2103 +9
============================================
- Hits 13250 12276 -974
+ Misses 10581 9349 -1232
+ Partials 832 762 -70
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.