pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Deterministic data-only CRC (2/2) - use data-only CRC for some segment replace flow

Open anuragrai16 opened this issue 2 months ago • 5 comments

Motivation

Resolves https://github.com/apache/pinot/issues/17262

Summary

This PR is a follow up of https://github.com/apache/pinot/pull/17264 that added a computation of data-only CRC for segments and uploaded it to ZK. It introduces a new parameter useDataCrcIfAvailable to the replaceSegmentIfCrcMismatch method, enabling more intelligent segment replacement decisions by distinguishing between data changes and index-only changes.

Details The data CRC is not used in explicit Reload and Replace calls from Helix, essentially keeping the behaviour same.

It is only used in the flow of RealtimeTableDataManager.doAddOnlineSegment() and OfflineTableDataManager.doAddOnlineSegment() , where the decisions of downloading segments (even if a local copy may be present) is taken based on the CRC.

The hasSameCRC method implements the fallback logic:

Primary Check: Compare Full Segment CRC (preserves existing behavior) Fallback Check: If Full CRC mismatches and useDataCrcIfAvailable=true, compare Data-Only CRC Legacy Support: If Data-Only CRC unavailable (< 0), fall back to Full CRC comparison

Overall, Code paths that continue to use full CRC (useDataCrcIfAvailable = false),

  1. Explicit Segment Replacement API
  2. Segment Reload Operations

Code paths that will use Data CRC when available (useDataCrcIfAvailable = true),

  1. CONSUMING→ONLINE transition for realtime segments
  2. Offline Table Segment Addition
  3. Server Startup and Segment Preload

anuragrai16 avatar Dec 16 '25 09:12 anuragrai16

Codecov Report

:x: Patch coverage is 73.33333% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.23%. Comparing base (d47ea27) to head (3911efb). :warning: Report is 101 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 50.00% 2 Missing and 1 partial :warning:
...ommon/metadata/segment/SegmentZKMetadataUtils.java 66.66% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##             master   #17380     +/-   ##
===========================================
  Coverage     63.22%   63.23%             
- Complexity     1475     1477      +2     
===========================================
  Files          3147     3167     +20     
  Lines        187575   189207   +1632     
  Branches      28712    28956    +244     
===========================================
+ Hits         118590   119640   +1050     
- Misses        59794    60281    +487     
- Partials       9191     9286     +95     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.21% <73.33%> (+0.01%) :arrow_up:
java-21 63.20% <73.33%> (+0.02%) :arrow_up:
temurin 63.23% <73.33%> (+<0.01%) :arrow_up:
unittests 63.22% <73.33%> (+<0.01%) :arrow_up:
unittests1 55.56% <60.00%> (-0.07%) :arrow_down:
unittests2 33.99% <40.00%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Dec 16 '25 10:12 codecov-commenter

@Jackie-Jiang PTAL as well.

anuragrai16 avatar Dec 16 '25 11:12 anuragrai16

cc @swaminathanmanish @KKcorps @krishan1390

Jackie-Jiang avatar Dec 16 '25 23:12 Jackie-Jiang

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline.

For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

anuragrai16 avatar Dec 17 '25 05:12 anuragrai16

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline.

For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

These helix state transitions apply to a lot of scenarios, not only for the committed segments. E.g. when server starts, all segments are loaded through these 2 state transitions, which will make server ignore changes applied by minions.

Jackie-Jiang avatar Dec 18 '25 00:12 Jackie-Jiang

We need to discuss when should we use data CRC instead of index CRC, and what is the side effect. When using data CRC, index only change happening in the deep store (i.e. new index added) won't be honored. This could prevent users from creating the index from minion and reduce the index creation on server. Given we want to solve the problem of real-time committed segment potentially having different CRC, I feel a better way to address this is to add a flag in ZK metadata to indicate that we can check only the data CRC. This flag only exists in committed segment, but not segment pushed from other ingestion flow

Thanks @Jackie-Jiang , we can do that. For my understanding, in the current code, I'm only using Data CRC in doAddOnlineSegment of the class OfflineTableDataManager and RealtimeTableDataManager , which are called in helix transition states during onBecomeOnlineFromConsuming and onBecomeOnlineFromOffline. For the other flows of reload segment, replace segments (used by minions), data CRC is not used. So, in a way, is the code already handling this point ? Or are there other flows that might be accidentally included in this ?

These helix state transitions apply to a lot of scenarios, not only for the committed segments. E.g. when server starts, all segments are loaded through these 2 state transitions, which will make server ignore changes applied by minions.

@Jackie-Jiang Makes sense. Updated the logic to persist an optional ZK flag for realtime committing segments that will be used to tell the replicas to use data CRC for replace. PTAL.

anuragrai16 avatar Dec 30 '25 07:12 anuragrai16