pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Make upsert compaction task more robust to crc mismatch

Open tibrewalpratik17 opened this issue 1 year ago • 1 comments

Pushing multiple fixes / enhancements in this patch together for faster reviews.

Enhancement 1

Solves Scenario 1 of https://github.com/apache/pinot/issues/13491 where Segment ZK Metadata CRC = Segment CRC deepstore != ValidDocID Bitmap CRC was happening.

The change here is to iterate through all peer servers and see if we can find atleast 1 host with matching CRCs and proceed with the task execution. Assumption is that we have one host which was the leader in updating ZK and uploading to deepstore as well.

We also fail the task-execution if we don't find any such host instead of silently skipping and succeeding the task which we were doing previously.

I tried this fix out for one of our tables and the tasks which were blocked initially due to this issue, it successfully executed compaction after this change and we saw a drop in the row-count and size. Screenshot 2024-06-27 at 4 20 57 PM

Enhancement 2

Solves https://github.com/apache/pinot/issues/13492. The controller fetches validDocIDBitmap from one of the replica servers and compares the CRC in the response against the segment ZK metadata CRC. If they don't match, the segment is not considered for compaction.

The fix is similar to the previous solution. We already retrieve all the bitmaps from all servers. Instead of randomly considering just one, we now iterate through all the bitmaps for each segment. If any of them match the ZK CRC, the segment is then considered for compaction.

I deployed this fix for one of our tables, which unblocked many segments for compaction. Attached is a screenshot showing the drop in rows. Screenshot 2024-06-29 at 1 00 00 AM

tibrewalpratik17 avatar Jun 26 '24 21:06 tibrewalpratik17

Codecov Report

Attention: Patch coverage is 25.86207% with 43 lines in your changes missing coverage. Please review.

Project coverage is 62.11%. Comparing base (59551e4) to head (90b1123). Report is 742 commits behind head on master.

Files Patch % Lines
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 0.00% 23 Missing :warning:
...upsertcompaction/UpsertCompactionTaskExecutor.java 0.00% 10 Missing :warning:
...t/controller/util/ServerSegmentMetadataReader.java 0.00% 7 Missing :warning:
...psertcompaction/UpsertCompactionTaskGenerator.java 83.33% 2 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13489      +/-   ##
============================================
+ Coverage     61.75%   62.11%   +0.35%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2558     +122     
  Lines        133233   140960    +7727     
  Branches      20636    21868    +1232     
============================================
+ Hits          82274    87553    +5279     
- Misses        44911    46793    +1882     
- Partials       6048     6614     +566     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration <0.01% <0.00%> (-0.01%) :arrow_down:
integration1 <0.01% <0.00%> (-0.01%) :arrow_down:
integration2 0.00% <0.00%> (ø)
java-11 62.06% <25.86%> (+0.35%) :arrow_up:
java-21 61.98% <25.86%> (+0.36%) :arrow_up:
skip-bytebuffers-false 62.09% <25.86%> (+0.34%) :arrow_up:
skip-bytebuffers-true 61.96% <25.86%> (+34.23%) :arrow_up:
temurin 62.11% <25.86%> (+0.35%) :arrow_up:
unittests 62.10% <25.86%> (+0.36%) :arrow_up:
unittests1 46.67% <ø> (-0.22%) :arrow_down:
unittests2 27.67% <25.86%> (-0.06%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 26 '24 22:06 codecov-commenter