Make upsert compaction task more robust to crc mismatch
Pushing multiple fixes / enhancements in this patch together for faster reviews.
Enhancement 1
Solves Scenario 1 of https://github.com/apache/pinot/issues/13491 where Segment ZK Metadata CRC = Segment CRC deepstore != ValidDocID Bitmap CRC was happening.
The change here is to iterate through all peer servers and see if we can find atleast 1 host with matching CRCs and proceed with the task execution. Assumption is that we have one host which was the leader in updating ZK and uploading to deepstore as well.
We also fail the task-execution if we don't find any such host instead of silently skipping and succeeding the task which we were doing previously.
I tried this fix out for one of our tables and the tasks which were blocked initially due to this issue, it successfully executed compaction after this change and we saw a drop in the row-count and size.
Enhancement 2
Solves https://github.com/apache/pinot/issues/13492. The controller fetches validDocIDBitmap from one of the replica servers and compares the CRC in the response against the segment ZK metadata CRC. If they don't match, the segment is not considered for compaction.
The fix is similar to the previous solution. We already retrieve all the bitmaps from all servers. Instead of randomly considering just one, we now iterate through all the bitmaps for each segment. If any of them match the ZK CRC, the segment is then considered for compaction.
I deployed this fix for one of our tables, which unblocked many segments for compaction. Attached is a screenshot showing the drop in rows.
Codecov Report
Attention: Patch coverage is 25.86207% with 43 lines in your changes missing coverage. Please review.
Project coverage is 62.11%. Comparing base (
59551e4) to head (90b1123). Report is 742 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #13489 +/- ##
============================================
+ Coverage 61.75% 62.11% +0.35%
+ Complexity 207 198 -9
============================================
Files 2436 2558 +122
Lines 133233 140960 +7727
Branches 20636 21868 +1232
============================================
+ Hits 82274 87553 +5279
- Misses 44911 46793 +1882
- Partials 6048 6614 +566
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration2 | 0.00% <0.00%> (ø) |
|
| java-11 | 62.06% <25.86%> (+0.35%) |
:arrow_up: |
| java-21 | 61.98% <25.86%> (+0.36%) |
:arrow_up: |
| skip-bytebuffers-false | 62.09% <25.86%> (+0.34%) |
:arrow_up: |
| skip-bytebuffers-true | 61.96% <25.86%> (+34.23%) |
:arrow_up: |
| temurin | 62.11% <25.86%> (+0.35%) |
:arrow_up: |
| unittests | 62.10% <25.86%> (+0.36%) |
:arrow_up: |
| unittests1 | 46.67% <ø> (-0.22%) |
:arrow_down: |
| unittests2 | 27.67% <25.86%> (-0.06%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.