pinot
pinot copied to clipboard
[Query] Server level runtime query throttling (pause/limit)
Why
- Keep servers stable under load (reduce OOM and spike-driven failures)
- Improve user experience by preventing single-query hogging and lowering error rates
What
- Global concurrency gate for in-flight queries integrated into FCFS and Priority schedulers
- Allows pausing/serializing queries (FIFO) or limiting concurrency at runtime
- Level-aware behavior driven by existing accounting thresholds:
- Alarm: optionally reduce concurrency (e.g., 1 to serialize)
- Critical: keep existing “kill the most expensive query”
- Panic: keep existing “kill all in-flight queries”
- Dynamic config updates (live, no restart) for throttling settings under pinot.query.scheduler.*
- Server admin API to inspect and adjust the current concurrency limit on the fly
Config (live reloaded)
-
pinot.query.scheduler.throttling.pause_on_alarm: boolean, default false -
pinot.query.scheduler.throttling.alarm_max_concurrent: int, cap during Alarm (e.g., 1) -
pinot.query.scheduler.throttling.normal_max_concurrent: int, cap during Normal (defaults to query runner threads) This complements existing heap-usage queue throttling (unchanged).
Admin API (server)
-
GET /throttling/state→ current flags and concurrency limit -
POST /throttling/setLimitwith body:
{ "_limit": 2 }
Design notes
- New
ThrottlingRuntimeprovides a permit gate shared by schedulers -
QueryResourceAggregatorinforms the gate on Alarm/Normal transitions -
ResourceUsageAccountantFactoryappliespinot.query.scheduler.*changes at runtime - Default behavior unchanged unless config or admin API is used
Backward compatibility and rollout
- Opt-in; defaults preserve existing behavior
- Safe to enable gradually via config or admin API
Codecov Report
:x: Patch coverage is 66.08696% with 39 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.42%. Comparing base (3b25db7) to head (068c1b8).
:warning: Report is 2 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #17078 +/- ##
============================================
- Coverage 63.43% 63.42% -0.02%
Complexity 1424 1424
============================================
Files 3089 3091 +2
Lines 182442 182557 +115
Branches 28006 28019 +13
============================================
+ Hits 115730 115779 +49
- Misses 57768 57821 +53
- Partials 8944 8957 +13
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.39% <66.08%> (-0.02%) |
:arrow_down: |
| java-21 | 63.39% <66.08%> (-0.02%) |
:arrow_down: |
| temurin | 63.42% <66.08%> (-0.02%) |
:arrow_down: |
| unittests | 63.41% <66.08%> (-0.02%) |
:arrow_down: |
| unittests1 | 56.27% <62.00%> (+<0.01%) |
:arrow_up: |
| unittests2 | 33.59% <30.43%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Is this feature built for SSE or MSE? IIRC MSE query doesn't go through QueryScheduler