pinot
pinot copied to clipboard
Make ingestion delay configurable: with concurrency fixes
We ran into some errors with previous version #14074 and hence it was reverted with #14127 . Here's the new PR with multiple fixes after the revert
The ingestion offset lag metric currently makes a request to kafka broker to get the latest offset on every update.
However, this has lead to an increasing amount of load on kafka brokers for multiple users.
The PR adds a way to enable/disable this metric and also configure its interval using the following properties in the server or cluster configs
pinot.server.instance.offset.lag.tracking.enable: true
pinot.server.instance.offset.lag.tracking.update.interval: 60000
Codecov Report
Attention: Patch coverage is 0% with 68 lines in your changes missing coverage. Please review.
Project coverage is 34.18%. Comparing base (
59551e4) to head (979c83c). Report is 1264 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...e/data/manager/realtime/IngestionDelayTracker.java | 0.00% | 67 Missing :warning: |
| ...a/manager/realtime/RealtimeSegmentDataManager.java | 0.00% | 1 Missing :warning: |
:exclamation: There is a different number of reports uploaded between BASE (59551e4) and HEAD (979c83c). Click for more details.
HEAD has 9 uploads less than BASE
Flag BASE (59551e4) HEAD (979c83c) integration2 3 2 temurin 12 11 skip-bytebuffers-false 7 5 unittests 5 3 unittests1 2 0 java-11 5 4
Additional details and impacted files
@@ Coverage Diff @@
## master #14142 +/- ##
=============================================
- Coverage 61.75% 34.18% -27.57%
- Complexity 207 765 +558
=============================================
Files 2436 2660 +224
Lines 133233 145739 +12506
Branches 20636 22297 +1661
=============================================
- Hits 82274 49826 -32448
- Misses 44911 91894 +46983
+ Partials 6048 4019 -2029
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration1 | 100.00% <ø> (+99.99%) |
:arrow_up: |
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 34.18% <0.00%> (-27.53%) |
:arrow_down: |
| java-21 | 34.18% <0.00%> (-27.45%) |
:arrow_down: |
| skip-bytebuffers-false | 34.18% <0.00%> (-27.56%) |
:arrow_down: |
| skip-bytebuffers-true | 34.16% <0.00%> (+6.43%) |
:arrow_up: |
| temurin | 34.18% <0.00%> (-27.57%) |
:arrow_down: |
| unittests | 34.18% <0.00%> (-27.57%) |
:arrow_down: |
| unittests1 | ? |
|
| unittests2 | 34.18% <0.00%> (+6.45%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@swaminathanmanish Please help review this
Before the PR, the logic is:
- For every message batch, we make a separate metadata fetch request to get the latest offset
This PR:
- Move the latest offset fetching logic into
IngestionDelayTrackerand periodically refreshIngestionInfo._latestOffset - Still use
_latestOffset - _currentOffsetto compute the ingestion lag - The problem of this approach is that
_latestOffsetand_currentOffsetare not updated together
To solve the above problem:
- Move the latest offset fetching logic into
IngestionDelayTracker - Fetch latest offset within
updateIngestionMetrics(), but with the given minimum interval (not fetch for every invocation) - Directly store
_offsetLaginstead of_latestOffsetin theIngestionInfo
Alternatively, we can actually directly calculate the lag in the RealtimeSegmentDataManager with the given minimum interval (put null when latest offset is not fetched). This way we don't need to manage metadata fetch within IngestionDelayTracker
@Jackie-Jiang went with the first alternative
I kind of prefer the second way, so that we don't need to manage metadata fetch within IngestionDelayTracker. It should be able to greatly simplify the logic. Can you give it a try?
Please also take a look at the test failure
Closing this in favor of #15831