pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Observed issues with check.crc.on.segment.load flag

Open cypherean opened this issue 3 months ago • 4 comments

The recently added check.crc.on.segment.load flag (#16432) has been helpful for handling realtime segments with mismatched CRCs despite identical data. However, during testing we observed a couple of side effects when this flag is enabled at the server level:

Offline table backfill: In our setup (Uber), offline segment names are date based. When backfilling for an older date, the uploaded segment has the same name but a different CRC. With check.crc.on.segment.load disabled, the controller updates to the new CRC, but the server does not load the updated segment.

// Steps to replicate:
// 1. Start quickstart batch with config set to false
// 2. update the crc of a completed segment locally
// 3. use the upload segment script to re-upload the existing segment

Handling message: ZnRecord=57211ff0-eb7d-4227-86dd-1d634a37a961, {CREATE_TIMESTAMP=1763398232543, EXECUTE_START_TIM ESTAMP=1763398232574, MSG_ID=57211ff0-eb7d-4227-86dd-1d634a37a961, MSG_STATE=new, MSG_SUBTYPE=REFRESH_SEGMENT, MSG_ TYPE=USER_DEFINE_MSG, PARTITION_NAME=airlineStats_OFFLINE_16071_16071_0, RESOURCE_NAME=airlineStats_OFFLINE, RETRY_ COUNT=0, SRC_CLUSTER=QuickStartCluster, SRC_INSTANCE_TYPE=PARTICIPANT, SRC_NAME=Controller_127.0.0.1_9002, TGT_NAME =Server_127.0.0.1_7050, TGT_SESSION_ID=10056a43eb20018, TIMEOUT=-1, segmentName=airlineStats_OFFLINE_16071_16071_0, tableName=airlineStats_OFFLINE}H}, Stat=Stat {_version=0, _creationTime=1763398232552,_modifiedTime=17633982325
52, _ephemeralOwner=0}
Replacing segment: airlineStats_OFFLINE_16071_16071_0 in table: airlineStats_OFFLINE
Replacing segment: airlineStats_OFFLINE_16071_16071_0
.....
Waiting for lock to reload/refresh: airlineStats_OFFLINE_16071_16071_0, queue-length: 0
Acquired lock to reload/refresh segment: airlineStats_OFFLINE_16071_16071_0 (lock-time=0ms, queue-length=0)
Skipping replacing segment: airlineStats_OFFLINE_16071_16071_0 even though its CRC has changed from: 912377941 to:
912377942 because instance.check.crc.on.segment.load is set to false
Message 57211ff0-eb7d-4227-86dd-1d634a37a961 completed.

Upsert compaction: When upsert compaction is enabled, Pinot attempts to commit a compacted segment that retains the same name but has a different CRC. This segment is also ignored by the server when the check.crc.on.segment.load flag is disabled.

// With upsert compaction enabled on testing table
Skipping replacing segment: redacted_tablename__14__458__20251118T1630Z even though its CRC has changed from: 3709360175 to: 533870250 because instance.check.crc.on.segment.load is set to false

These behaviors suggest that skipping CRC check may inadvertently block valid segment reloads in scenarios where segment names remain constant but CRCs legitimately change (backfill, compaction).

cypherean avatar Nov 18 '25 18:11 cypherean

Relates to https://github.com/apache/pinot/issues/11004

I think it will be much safer to move this to a table level config so it can be enabled for tables where replica CRCs can deviate rather apply at cluster level that will have side effect on batch uploads and minion segment upload.

rohityadav1993 avatar Nov 18 '25 19:11 rohityadav1993

@Jackie-Jiang , not sure what is the long term plan for CRC mismatch issue, but would it make sense to have a CRC for just data part and not indexes?

rohityadav1993 avatar Nov 18 '25 19:11 rohityadav1993

+1 , I think a data-only CRC instead of completely skipping CRC check might be better here. There are still more corner cases with data only CRCs as well. eg. using ingestion transformations like now() causes the data to mismatch as well across replicas, which will lead to false-positive segment downloads from deep-store as well.

We've internally switched to a table-level configuration for skipCRC temporarily, so we can work around enabling this for offline tables. But, a good long-term fix is needed for the CRC mismatch issue.

anuragrai16 avatar Nov 20 '25 09:11 anuragrai16

Instead of upstreaming the skip CRC check - I would propose replacing it with a data only CRC. Added details in this issue. , which can keep it both backward compatible and safe from other external index implementation defined in Pinot in the future.

anuragrai16 avatar Nov 24 '25 04:11 anuragrai16