pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Adaptive merge rollup segment sizing

Open davecromberge opened this issue 2 months ago • 1 comments

Summary

Enable size-based segment generation for tables with variable-sized data (e.g., Theta sketches) where static row counts produce inconsistent segment sizes. Finally, adds size-based segment grouping for MergeRollupTask.

Implemented two strategies:

  • AdaptiveSegmentNumRowProvider: EMA-based learning for homogeneous data
  • PercentileAdaptiveSegmentNumRowProvider: Reservoir sampling with percentile estimation for heterogeneous/multi-tenant data (resistant to outliers)

Configuration reads directly from MergeRollupTask config map, following the eraseDimensionValues pattern. No changes to shared SegmentConfig or framework.

Example config:
{
  "MergeRollupTask": {
    "maxSegmentSizeBytesPerTask": "4194000",
    "desiredSegmentSizeBytes": "209715200",
    "segmentSizingStrategy": "PERCENTILE",
    "sizingPercentile": "75"
  }
}

Instructions:

The PR has to be tagged with at least one of the following labels (*):

  • feature
  • performance
  • release-notes - New configuration options

davecromberge avatar Nov 19 '25 11:11 davecromberge

:x: 1 Tests Failed:

Tests completed Failed Passed Skipped
10107 1 10106 47
View the top 3 failed test(s) by shortest run time
org.apache.pinot.plugin.minion.tasks.mergerollup.MergeRollupTaskGeneratorTest::testMaxSegmentSizeBytesPerTask
Stack Traces | 0.014s run time
expected [1] but found [2]
org.apache.pinot.controller.helix.core.minion.PinotTaskManagerDistributedLockingTest::testConcurrentCreateTaskFromMultipleControllers
Stack Traces | 9.07s run time
At least one task generation should have occurred expected [1] but found [2]
org.apache.pinot.controller.helix.core.minion.PinotTaskManagerDistributedLockingTest::testConcurrentCreateTaskFromMultipleControllers
Stack Traces | 12s run time
At least one task generation should have occurred expected [1] but found [2]

To view more test analytics, go to the Test Analytics Dashboard 📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

codecov-commenter avatar Nov 19 '25 12:11 codecov-commenter