Add consensus check before selecting a segment for compaction or dele…
Problem
As discussed in #17337, Segments are incorrectly deleted when any single replica reported zero valid documents, causing permanent data loss during server restarts and async state transitions where replicas had inconsistent validDocIds.
Solution
Added consensus requirement: only delete/compact segments when ALL replicas agree on validDoc counts Prevents race conditions during OFFLINE→ONLINE transitions and tie-breaking logic divergence
Changes
- New hasValidDocConsensus() method to verify replica agreement before operation to select segment for deletion or compaction
Tests
Deployed this change in a test cluster and see this warning message about skipping few segments due to validDocId mismatch.
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__101__181__20251215T2107Z. Expected: 607695, but found: 607791
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__53__142__20251215T0114Z. Expected: 423539, but found: 423645
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__121__155__20251215T0859Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__75__184__20251215T2158Z. Expected: 595070, but found: 594974
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__19__171__20251215T1626Z. Expected: 569283, but found: 569224
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] CRC mismatch for segment: rta_tarunm_index__12__188__20251215T2329Z, (segmentZKMetadata=450178614, validDocIdsMetadata=3330760987)
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__27__148__20251215T0504Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Segment rta_tarunm_index__26__164__20251215T1301Z marked for deletion - all replicas agree it has zero valid documents
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__25__156__20251215T0922Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas
Codecov Report
:x: Patch coverage is 35.71429% with 18 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.30%. Comparing base (042549d) to head (dcee854).
:warning: Report is 3 commits behind head on master.
:white_check_mark: All tests successful. No failed tests found.
Additional details and impacted files
@@ Coverage Diff @@
## master #17352 +/- ##
============================================
- Coverage 63.34% 63.30% -0.04%
- Complexity 1474 1476 +2
============================================
Files 3162 3162
Lines 188591 188611 +20
Branches 28857 28861 +4
============================================
- Hits 119454 119408 -46
- Misses 59891 59957 +66
Partials 9246 9246
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.28% <35.71%> (-0.01%) |
:arrow_down: |
| java-21 | 63.24% <35.71%> (-0.03%) |
:arrow_down: |
| temurin | 63.30% <35.71%> (-0.04%) |
:arrow_down: |
| unittests | 63.30% <35.71%> (-0.04%) |
:arrow_down: |
| unittests1 | 55.59% <100.00%> (-0.02%) |
:arrow_down: |
| unittests2 | 34.05% <35.71%> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@tibrewalpratik17 @xiangfu0 can you please take a look at this?
@xiangfu0 gentle bump on this.