pinot
pinot copied to clipboard
Adaptive merge rollup segment sizing
Summary
Enable size-based segment generation for tables with variable-sized data (e.g., Theta sketches) where static row counts produce inconsistent segment sizes. Finally, adds size-based segment grouping for MergeRollupTask.
Implemented two strategies:
- AdaptiveSegmentNumRowProvider: EMA-based learning for homogeneous data
- PercentileAdaptiveSegmentNumRowProvider: Reservoir sampling with percentile estimation for heterogeneous/multi-tenant data (resistant to outliers)
Configuration reads directly from MergeRollupTask config map, following the eraseDimensionValues pattern. No changes to shared SegmentConfig or framework.
Example config:
{
"MergeRollupTask": {
"maxSegmentSizeBytesPerTask": "4194000",
"desiredSegmentSizeBytes": "209715200",
"segmentSizingStrategy": "PERCENTILE",
"sizingPercentile": "75"
}
}
Instructions:
The PR has to be tagged with at least one of the following labels (*):
-
feature -
performance -
release-notes- New configuration options
:x: 1 Tests Failed:
| Tests completed | Failed | Passed | Skipped |
|---|---|---|---|
| 10107 | 1 | 10106 | 47 |
View the top 3 failed test(s) by shortest run time
org.apache.pinot.plugin.minion.tasks.mergerollup.MergeRollupTaskGeneratorTest::testMaxSegmentSizeBytesPerTaskStack Traces | 0.014s run time
expected [1] but found [2]
org.apache.pinot.controller.helix.core.minion.PinotTaskManagerDistributedLockingTest::testConcurrentCreateTaskFromMultipleControllersStack Traces | 9.07s run time
At least one task generation should have occurred expected [1] but found [2]
org.apache.pinot.controller.helix.core.minion.PinotTaskManagerDistributedLockingTest::testConcurrentCreateTaskFromMultipleControllersStack Traces | 12s run time
At least one task generation should have occurred expected [1] but found [2]
To view more test analytics, go to the Test Analytics Dashboard 📋 Got 3 mins? Take this short survey to help us improve Test Analytics.