Set skipCrcCheckOnLoad in table config
Fix for #17233
This PR adds skipCrcCheckOnLoad as table config as an alternate to server level config to allow for fine grained control on disabling CRC check for tables.
New config added: skipCrcCheckOnLoad under segmentsConfig in table config
Test Plan
Tested with Pinot batch quickstart:
- With skipCrcCheckOnLoad config enabled in airlineStats table:
Handling message: ZnRecord=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, {CREATE_TIMESTAMP=1763551164828, EXECUTE_START_TIMESTAMP=1763551164866, MSG_ID=cb7bee95-dde7-41c9-a0ad-53e4a26e6381, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=airlineStats_OFFLINE_16074_16074_0, RESOURCE_NAME=airlineStats_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c2cb4d30018, TIMEOUT=-1, segmentName=airlineStats_OFFLINE_16074_16074_0, tableName=airlineStats_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763551164837, _modifiedTime=1763551164837, _ephemeralOwner=0}
Replacing segment: airlineStats_OFFLINE_16074_16074_0 in table: airlineStats_OFFLINE
Replacing segment: airlineStats_OFFLINE_16074_16074_0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@10bdfbcc type: CALLBACK Took: 1ms
The latency of message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 is 38 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: airlineStats_OFFLINE_16074_16074_0, queue-length: 0
Acquired lock to reload/refresh segment: airlineStats_OFFLINE_16074_16074_0 (lock-time=0ms, queue-length=0)
Skipping replacing segment: airlineStats_OFFLINE_16074_16074_0 even though its CRC has changed from: 2686808396 to: 2686808519 because skipCrcCheckOnLoad is enabled
Message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 completed.
Scheduling message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7: brokerResource:airlineStats_OFFLINE, null->null
Submit task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 to pool: java.util.concurrent.ThreadPoolExecutor@6350aec6[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
Message: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 handling task scheduled
377 END:INVOKE CallbackHandler 28, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.messaging.handling.HelixTaskExecutor@702c26cd type: CALLBACK Took: 7ms
handling task: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 begin, at: 1763551164872
Refreshing segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Refreshed segment: airlineStats_OFFLINE_16074_16074_0 for table: airlineStats_OFFLINE
Message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 completed.
Handled request from 127.0.0.1 POST http://localhost:9002/v2/segments?tableName=airlineStats&tableType=OFFLINE, content-type multipart/form-data; charset=ISO-8859-1; boundary=httpclient_boundary_8c757373-6c11-4c17-a0c3-2d3fad879992 status code 200 OK
Delete message cb7bee95-dde7-41c9-a0ad-53e4a26e6381 from zk!
message finished: cb7bee95-dde7-41c9-a0ad-53e4a26e6381, took 11
Message: cb7bee95-dde7-41c9-a0ad-53e4a26e6381 (parent: null) handling task for airlineStats_OFFLINE:airlineStats_OFFLINE_16074_16074_0 completed at: 1763551164877, results: true. FrameworkTime: 11 ms; HandlerTime: 1 ms.
Delete message e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7 from zk!
message finished: e3a32ea1-8025-40c7-ac3f-7bfb9f2bf8a7, took 10
- With skipCrcCheckOnLoad disabled on billing table by default:
Handling message: ZnRecord=c5bb3f12-3937-4761-99ed-b7833f02789b, {CREATE_TIMESTAMP=1763552941563, EXECUTE_START_TIMESTAMP=1763552941592, MSG_ID=c5bb3f12-3937-4761-99ed-b7833f02789b, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_TYPE=USER_DEFINE_MSG, PARTITION_NAME=billing_OFFLINE_0, RESOURCE_NAME=billing_OFFLINE, RETRY_COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME=Server_127.0.0.1_7050, TGT_SESSION_ID=1005c43051f0018, TIMEOUT=-1, segmentName=billing_OFFLINE_0, tableName=billing_OFFLINE}{}{}, Stat=Stat {_version=0, _creationTime=1763552941572, _modifiedTime=1763552941572, _ephemeralOwner=0}
CallbackHandler26, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
START: GenericClusterController.onMessage() for cluster QuickStartCluster
Replacing segment: billing_OFFLINE_0 in table: billing_OFFLINE
Replacing segment: billing_OFFLINE_0
CallbackHandler28, Subscribing to path: /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES took: 0
END: GenericClusterController.onMessage() for cluster QuickStartCluster
144 END:INVOKE CallbackHandler 26, /QuickStartCluster/INSTANCES/Broker_127.0.0.1_8000/MESSAGES listener: org.apache.helix.controller.GenericHelixController@382d549a type: CALLBACK Took: 0ms
START: InstanceMessagesCache.refresh()
The latency of message dd9dfcb1-3f9c-4c3e-b009-9ba5f24ef8bb is 30 ms
END: InstanceMessagesCache.refresh(), 2 of Messages read from ZooKeeper. took 1 ms.
Start to refresh stale message cache
Waiting for lock to reload/refresh: billing_OFFLINE_0, queue-length: 0
Acquired lock to reload/refresh segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0)
Replacing segment: billing_OFFLINE_0 because its CRC has changed from: 2262953635 to: 2262953636
Downloading and loading segment: billing_OFFLINE_0
Downloading segment: billing_OFFLINE_0 from: http://127.0.0.1:9002/segments/billing/billing_OFFLINE_0
Acquiring instance level segment download semaphore for segment: billing_OFFLINE_0, queue-length: 0
Acquired instance level segment download semaphore for segment: billing_OFFLINE_0 (lock-time=0ms, queue-length=0).
Also tested with realtime tables with lucene index and upsert compaction on our test clusters.
Codecov Report
:x: Patch coverage is 78.26087% with 15 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.26%. Comparing base (8de6fa9) to head (aeec6d0).
:warning: Report is 239 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| .../pinot/core/data/manager/BaseTableDataManager.java | 75.80% | 8 Missing and 7 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #17249 +/- ##
============================================
+ Coverage 63.20% 63.26% +0.05%
- Complexity 1428 1476 +48
============================================
Files 3117 3167 +50
Lines 184391 189109 +4718
Branches 28278 28933 +655
============================================
+ Hits 116552 119646 +3094
- Misses 58829 60184 +1355
- Partials 9010 9279 +269
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.22% <78.26%> (+0.03%) |
:arrow_up: |
| java-21 | 63.22% <78.26%> (+0.07%) |
:arrow_up: |
| temurin | 63.26% <78.26%> (+0.05%) |
:arrow_up: |
| unittests | 63.26% <78.26%> (+0.05%) |
:arrow_up: |
| unittests1 | 55.62% <78.26%> (-0.48%) |
:arrow_down: |
| unittests2 | 34.01% <8.69%> (+0.35%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Since the segment management is done via a state machine, the only way to get consistent behavior is to always ignore the crc check when loading segment, which also means segment refresh (in-place replacement) won't be supported. Do you see this as an okay trade-off? If so, let's clearly document the behavior
Since the segment management is done via a state machine, the only way to get consistent behavior is to always ignore the crc check when loading segment, which also means segment refresh (in-place replacement) won't be supported. Do you see this as an okay trade-off? If so, let's clearly document the behavior
@Jackie-Jiang this gap is already exposed as an instance level config. Going forward
- Do we want to ensure this is only exposed on add, reload paths and we're able to detect replace even when server is down and treat it as such?
- Or do we allow to skip crc check for all 3 as is the current behavior and document the gaps for the user?
- Or try add/replacement on a best effort basis and serve from local copy on failure (same as reload), but actively alert and repair?
My take - We need a combination of 1 and 3, not just as a short term fix but as a fallback for reliability. Suggested fixes:
- Detect replace even when server is down and treat it as replace not add
- Try add/replacement on a best effort basis and serve from local copy on failure - this will be behind the flag that let's us skip crc check right now.
Pushing the changes for (2), I'll look into 1 in a follow up PR
After #17380, do you still see this change needed?
After #17380, do you still see this change needed?
I think this change is still needed. We should handle the already-introduced CRC skip more gracefully without limiting in-place replacement. Making it safer before deprecating the configuration feels like the right approach.