pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Make ingestion delay configurable: with concurrency fixes

Open KKcorps opened this issue 1 year ago • 4 comments

We ran into some errors with previous version #14074 and hence it was reverted with #14127 . Here's the new PR with multiple fixes after the revert

The ingestion offset lag metric currently makes a request to kafka broker to get the latest offset on every update.

However, this has lead to an increasing amount of load on kafka brokers for multiple users.

The PR adds a way to enable/disable this metric and also configure its interval using the following properties in the server or cluster configs

pinot.server.instance.offset.lag.tracking.enable: true
pinot.server.instance.offset.lag.tracking.update.interval: 60000

KKcorps avatar Oct 02 '24 05:10 KKcorps

Codecov Report

Attention: Patch coverage is 0% with 68 lines in your changes missing coverage. Please review.

Project coverage is 34.18%. Comparing base (59551e4) to head (979c83c). Report is 1264 commits behind head on master.

Files with missing lines Patch % Lines
...e/data/manager/realtime/IngestionDelayTracker.java 0.00% 67 Missing :warning:
...a/manager/realtime/RealtimeSegmentDataManager.java 0.00% 1 Missing :warning:

:exclamation: There is a different number of reports uploaded between BASE (59551e4) and HEAD (979c83c). Click for more details.

HEAD has 9 uploads less than BASE
Flag BASE (59551e4) HEAD (979c83c)
integration2 3 2
temurin 12 11
skip-bytebuffers-false 7 5
unittests 5 3
unittests1 2 0
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #14142       +/-   ##
=============================================
- Coverage     61.75%   34.18%   -27.57%     
- Complexity      207      765      +558     
=============================================
  Files          2436     2660      +224     
  Lines        133233   145739    +12506     
  Branches      20636    22297     +1661     
=============================================
- Hits          82274    49826    -32448     
- Misses        44911    91894    +46983     
+ Partials       6048     4019     -2029     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) :arrow_up:
integration 100.00% <ø> (+99.99%) :arrow_up:
integration1 100.00% <ø> (+99.99%) :arrow_up:
integration2 0.00% <ø> (ø)
java-11 34.18% <0.00%> (-27.53%) :arrow_down:
java-21 34.18% <0.00%> (-27.45%) :arrow_down:
skip-bytebuffers-false 34.18% <0.00%> (-27.56%) :arrow_down:
skip-bytebuffers-true 34.16% <0.00%> (+6.43%) :arrow_up:
temurin 34.18% <0.00%> (-27.57%) :arrow_down:
unittests 34.18% <0.00%> (-27.57%) :arrow_down:
unittests1 ?
unittests2 34.18% <0.00%> (+6.45%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 02 '24 06:10 codecov-commenter

@swaminathanmanish Please help review this

Jackie-Jiang avatar Oct 07 '24 22:10 Jackie-Jiang

Before the PR, the logic is:

  • For every message batch, we make a separate metadata fetch request to get the latest offset

This PR:

  • Move the latest offset fetching logic into IngestionDelayTracker and periodically refresh IngestionInfo._latestOffset
  • Still use _latestOffset - _currentOffset to compute the ingestion lag
  • The problem of this approach is that _latestOffset and _currentOffset are not updated together

To solve the above problem:

  • Move the latest offset fetching logic into IngestionDelayTracker
  • Fetch latest offset within updateIngestionMetrics(), but with the given minimum interval (not fetch for every invocation)
  • Directly store _offsetLag instead of _latestOffset in the IngestionInfo

Jackie-Jiang avatar Oct 22 '24 21:10 Jackie-Jiang

Alternatively, we can actually directly calculate the lag in the RealtimeSegmentDataManager with the given minimum interval (put null when latest offset is not fetched). This way we don't need to manage metadata fetch within IngestionDelayTracker

Jackie-Jiang avatar Oct 23 '24 06:10 Jackie-Jiang

@Jackie-Jiang went with the first alternative

KKcorps avatar Oct 30 '24 16:10 KKcorps

I kind of prefer the second way, so that we don't need to manage metadata fetch within IngestionDelayTracker. It should be able to greatly simplify the logic. Can you give it a try? Please also take a look at the test failure

Jackie-Jiang avatar Nov 01 '24 06:11 Jackie-Jiang

Closing this in favor of #15831

KKcorps avatar May 19 '25 04:05 KKcorps