pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Set skipCrcCheckOnLoad in table config

Open cypherean opened this issue 3 months ago • 2 comments

Fix for #17233

This PR adds skipCrcCheckOnLoad as table config as an alternate to server level config to allow for fine grained control on disabling CRC check for tables.

New config added: skipCrcCheckOnLoad under segmentsConfig in table config

Test Plan

Tested with Pinot batch quickstart:

  1. With skipCrcCheckOnLoad config enabled in airlineStats table:
Handling message: ZnRecord=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, {CREATE_TIMESTAMP=1763551164828, EXECUTE_START_TIMESTAMP=1763551164866, MSG_ID=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=airlineStats_OFFLINE_16074_16074_0, RESOURCE_NAME=airlineStats_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c2cb4d30018, TIMEOUT=-1, segmentName=airlineStats_OFFLINE_16074_16074_0, tableName=airlineStats_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763551164837, _modifiedTime=1763551164837, _ephemeralOwner=0}
Replacing segment: airlineStats_OFFLINE_16074_16074_0 in table: airlineStats_OFFLINE
Replacing segment: airlineStats_OFFLINE_16074_16074_0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@10bdfbcc type: CALLBACK Took: 1ms
The latency of message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 is 38 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: airlineStats_OFFLINE_16074_16074_0, queue-length: 0
Acquired lock to reload/refresh segment: airlineStats_OFFLINE_16074_16074_0 (lock-time=0ms, queue-length=0)
Skipping replacing segment: airlineStats_OFFLINE_16074_16074_0 even though its CRC has changed from: 2686808396 to: 2686808519 because skipCrcCheckOnLoad is enabled
Message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 completed.
Scheduling message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7: brokerResource:airlineStats_OFFLINE, null->null
Submit task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 to pool: java.util.concurrent.ThreadPoolExecutor@6350aec6[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
Message: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 handling task scheduled
377 END:INVOKE CallbackHandler 28, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@702c26cd type: CALLBACK Took: 7ms
handling task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 begin, at: 1763551164872
Refreshing segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Refreshed segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 completed.
Handled request from 127.0.0.1 POST http://localhost:9002/v2/segments?tableName=airlineStats&tableType=OFFLINE, content-type multipart/form-data; charset=ISO-8859-1; boundary=httpclient_boundary_8c757373-6c11-4c17-a0c3-2d3fad879992 status code 200 OK
Delete message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 from zk!
message finished: cb7bee95-dde7-41c9-a0ad-53e4a26e6381, took 11
Message: cb7bee95-dde7-41c9-a0ad-53e4a26e6381 (parent: null) handling task for airlineStats_OFFLINE:airlineStats_OFFLINE_16074_16074_0 completed at: 1763551164877, results: true. FrameworkTime: 11 ms; HandlerTime: 1 ms.
Delete message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 from zk!
message finished: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7, took 10
  1. With skipCrcCheckOnLoad disabled on billing table by default:
Handling message: ZnRecord=c5bb3f12-3937-4761-99ed-b7833f02789b, {CREATE_TIMESTAMP=1763552941563, EXECUTE_START_TIMESTAMP=1763552941592, MSG_ID=c5bb3f12-3937-4761-99ed-b7833f02789b, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=billing_OFFLINE_0, RESOURCE_NAME=billing_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c43051f0018, TIMEOUT=-1, segmentName=billing_OFFLINE_0, tableName=billing_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763552941572, _modifiedTime=1763552941572, _ephemeralOwner=0}
CallbackHandler26, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
START: GenericClusterController.onMessage() for cluster QuickStartCluster
Replacing segment: billing_OFFLINE_0 in table: billing_OFFLINE
Replacing segment: billing_OFFLINE_0
CallbackHandler28, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@382d549a type: CALLBACK Took: 0ms
START: InstanceMessagesCache.refresh()
The latency of message dd9dfcb1-3f9c-4c3e-b009-9ba5f24ef8bb is 30 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: billing_OFFLINE_0, queue-length: 0
Acquired lock to reload/refresh segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0)
Replacing segment: billing_OFFLINE_0 because its CRC has changed from: 2262953635 to: 2262953636
Downloading and loading segment: billing_OFFLINE_0
Downloading segment: billing_OFFLINE_0 from: http://127.0.0.1:9002/segments/billing/billing_OFFLINE_0
Acquiring instance level segment download semaphore for segment: billing_OFFLINE_0, queue-length: 0
Acquired instance level segment download semaphore for segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0).

Also tested with realtime tables with lucene index and upsert compaction on our test clusters.

cypherean avatar Nov 20 '25 17:11 cypherean

Codecov Report

:x: Patch coverage is 78.26087% with 15 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.26%. Comparing base (8de6fa9) to head (aeec6d0). :warning: Report is 239 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 75.80% 8 Missing and 7 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17249      +/-   ##
============================================
+ Coverage     63.20%   63.26%   +0.05%     
- Complexity     1428     1476      +48     
============================================
  Files          3117     3167      +50     
  Lines        184391   189109    +4718     
  Branches      28278    28933     +655     
============================================
+ Hits         116552   119646    +3094     
- Misses        58829    60184    +1355     
- Partials       9010     9279     +269     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.22% <78.26%> (+0.03%) :arrow_up:
java-21 63.22% <78.26%> (+0.07%) :arrow_up:
temurin 63.26% <78.26%> (+0.05%) :arrow_up:
unittests 63.26% <78.26%> (+0.05%) :arrow_up:
unittests1 55.62% <78.26%> (-0.48%) :arrow_down:
unittests2 34.01% <8.69%> (+0.35%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Nov 20 '25 18:11 codecov-commenter

Since the segment management is done via a state machine, the only way to get consistent behavior is to always ignore the crc check when loading segment, which also means segment refresh (in-place replacement) won't be supported. Do you see this as an okay trade-off? If so, let's clearly document the behavior

Jackie-Jiang avatar Dec 08 '25 00:12 Jackie-Jiang

Since the segment management is done via a state machine, the only way to get consistent behavior is to always ignore the crc check when loading segment, which also means segment refresh (in-place replacement) won't be supported. Do you see this as an okay trade-off? If so, let's clearly document the behavior

@Jackie-Jiang this gap is already exposed as an instance level config. Going forward

  1. Do we want to ensure this is only exposed on add, reload paths and we're able to detect replace even when server is down and treat it as such?
  2. Or do we allow to skip crc check for all 3 as is the current behavior and document the gaps for the user?
  3. Or try add/replacement on a best effort basis and serve from local copy on failure (same as reload), but actively alert and repair?

My take - We need a combination of 1 and 3, not just as a short term fix but as a fallback for reliability. Suggested fixes:

  1. Detect replace even when server is down and treat it as replace not add
  2. Try add/replacement on a best effort basis and serve from local copy on failure - this will be behind the flag that let's us skip crc check right now.

Pushing the changes for (2), I'll look into 1 in a follow up PR

cypherean avatar Jan 12 '26 19:01 cypherean

After #17380, do you still see this change needed?

Jackie-Jiang avatar Jan 14 '26 23:01 Jackie-Jiang

After #17380, do you still see this change needed?

I think this change is still needed. We should handle the already-introduced CRC skip more gracefully without limiting in-place replacement. Making it safer before deprecating the configuration feels like the right approach.

cypherean avatar Jan 16 '26 07:01 cypherean