pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add Segment Selection Randomization to UpsertCompactionTask

Open tarun11Mavani opened this issue 4 months ago • 3 comments

Problem

UpsertCompactionTask consistently selects the same top segments with the highest invalid record counts for compaction. When some of these top segments encounter issues (corrupted data, processing failures, etc.) dur, they can block the entire compaction queue, preventing other segments from being compacted and reducing overall system throughput. We ran into a similar issue when some of the deepstore copies were corrupted and generator ended up selecting same set of segments that were corrupted and it ended up stopping compaction entirely for a table.

Solution

This PR introduces configurable randomization in segment selection to reduce contention and improve compaction resilience:

  • New Configuration Parameter: segmentSelectionRandomizationFactor (default: 2.0) When set to 2.0, selects from top (maxTasks × 2.0) segments as candidates, then randomly picks maxTasks from them
  • Values ≤ 1.0 disable randomization (deterministic behavior)
  • Uses reservoir sampling algorithm for O(n) time complexity instead of expensive shuffling operations
  • Implements the randomization logic in BaseTaskGenerator as a reusable method for other task generators

tarun11Mavani avatar Nov 02 '25 16:11 tarun11Mavani

Codecov Report

:x: Patch coverage is 74.19355% with 8 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.21%. Comparing base (ceb2206) to head (130e3bb). :warning: Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...psertcompaction/UpsertCompactionTaskGenerator.java 53.33% 7 Missing :warning:
...helix/core/minion/generator/BaseTaskGenerator.java 93.75% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17126      +/-   ##
============================================
+ Coverage     63.18%   63.21%   +0.02%     
- Complexity     1428     1430       +2     
============================================
  Files          3104     3110       +6     
  Lines        183488   183811     +323     
  Branches      28127    28163      +36     
============================================
+ Hits         115943   116198     +255     
- Misses        58587    58626      +39     
- Partials       8958     8987      +29     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 ?
java-11 63.14% <70.96%> (-0.02%) :arrow_down:
java-21 63.18% <74.19%> (+0.03%) :arrow_up:
temurin 63.21% <74.19%> (+0.02%) :arrow_up:
unittests 63.21% <74.19%> (+0.02%) :arrow_up:
unittests1 56.08% <ø> (-0.12%) :arrow_down:
unittests2 33.70% <74.19%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Nov 04 '25 10:11 codecov-commenter

@abhishekbafna @swaminathanmanish can you please take a look at this?

tarun11Mavani avatar Nov 10 '25 15:11 tarun11Mavani

@abhishekbafna @swaminathanmanish gentle bump on this.

tarun11Mavani avatar Nov 20 '25 05:11 tarun11Mavani