pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Revert Upsert Metadata of a segment during Inconsistencies

Open deepthi912 opened this issue 3 months ago • 1 comments

Continuation to: https://github.com/apache/pinot/pull/17144

This PR is a long term fix without having to disable reload on consuming segment/force commit for Partial Upserts and Tables with dropOutOfOrder=true. In this PR, the following solution is implemented to revert the metadata of the current segment incases of inconsistencies. Note that, we do not fix any inconsistencies that has happened in the past, the inconsistencies will only be fixed starting from now.

During Consumption:

  • We keep track of the current segment's pk -> RecordLocation of the previous location that is existing only in the other segment

During segment replacement when the metadata inconsistencies are detected:

  • Update the existing pk -> Record Location map with the values in the temporary previous record location map being stored for each consuming segment.
  • Update the valid doc id
  • For newly added keys, remove the entry in the upsert metadata map and from the valid docids.
  • This map is erased once the segment is added/ replaced. The map will be started empty for each new consuming segment.

deepthi912 avatar Dec 04 '25 22:12 deepthi912

Codecov Report

:x: Patch coverage is 58.33333% with 65 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.27%. Comparing base (6182bed) to head (2cd409d). :warning: Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...tionUpsertMetadataManagerForConsistentDeletes.java 52.94% 20 Missing and 4 partials :warning:
...cal/upsert/BasePartitionUpsertMetadataManager.java 54.00% 19 Missing and 4 partials :warning:
...t/ConcurrentMapPartitionUpsertMetadataManager.java 67.27% 13 Missing and 5 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17324      +/-   ##
============================================
+ Coverage     63.26%   63.27%   +0.01%     
  Complexity     1474     1474              
============================================
  Files          3161     3161              
  Lines        188390   188556     +166     
  Branches      28822    28851      +29     
============================================
+ Hits         119185   119318     +133     
- Misses        59977    60000      +23     
- Partials       9228     9238      +10     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.24% <58.33%> (+0.02%) :arrow_up:
java-21 55.57% <1.28%> (-7.66%) :arrow_down:
temurin 63.27% <58.33%> (+0.01%) :arrow_up:
unittests 63.27% <58.33%> (+0.01%) :arrow_up:
unittests1 55.61% <1.28%> (-0.06%) :arrow_down:
unittests2 34.00% <58.33%> (+0.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Dec 04 '25 23:12 codecov-commenter

I added two maps both corresponding to a consuming segment for better simplicity and to separate the operations of reverting the pks to other segment location vs removing the new primary keys from a consuming segment.

deepthi912 avatar Jan 10 '26 08:01 deepthi912