pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add post-partial-upsert transforms for derived columns

Open xiangfu0 opened this issue 3 months ago • 1 comments

This pull request adds support for applying post-update transform functions after partial upsert merges in Pinot. This enables derived columns to be recomputed based on the latest merged row, enhancing the flexibility and correctness of partial upsert tables. The changes introduce a new mechanism to configure and execute these post-upsert transforms, update the core upsert logic to invoke them, and add tests to validate the new behavior.

Partial Upsert Post-Update Transform Support

  • Added a new method getPostPartialUpsertTransformers in RecordTransformerUtils to create a list of transformers that are applied after a partial upsert merge, based on new post-update transform configs in UpsertConfig.
  • Updated PartialUpsertHandler to accept TableConfig and initialize post-update transformers, and to apply these transformers to the merged row after the upsert merge is performed. [1] [2] [3]

Expression Transformer Enhancements

  • Modified ExpressionTransformer to support an overwriteExistingValues mode, which allows transforms to run even if the column already has a value—necessary for post-upsert transforms. [1] [2] [3]
  • Improved null value handling: when a transform produces a null, the field is now explicitly marked as a null value in the row. [1] [2]

Testing

  • Added a new integration test testPartialUpsertPostUpdateTransforms to verify that post-upsert transforms correctly recompute derived columns after partial upsert merges.
  • Added unit tests to ensure the correct marking of null values when transforms produce nulls, both when the field is empty and when it already has a value.

These changes collectively make partial upsert tables more robust and allow for advanced use cases where derived columns must be kept up-to-date after merge operations.

xiangfu0 avatar Dec 03 '25 09:12 xiangfu0

Codecov Report

:x: Patch coverage is 92.22222% with 7 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.17%. Comparing base (7305eec) to head (77b5db3).

Files with missing lines Patch % Lines
...local/recordtransformer/ExpressionTransformer.java 95.38% 0 Missing and 3 partials :warning:
...ocal/recordtransformer/RecordTransformerUtils.java 85.71% 1 Missing and 1 partial :warning:
...rg/apache/pinot/spi/config/table/UpsertConfig.java 60.00% 1 Missing and 1 partial :warning:
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #17308   +/-   ##
=========================================
  Coverage     63.16%   63.17%           
  Complexity     1479     1479           
=========================================
  Files          3173     3173           
  Lines        189917   190001   +84     
  Branches      29064    29089   +25     
=========================================
+ Hits         119970   120025   +55     
- Misses        60621    60654   +33     
+ Partials       9326     9322    -4     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.13% <92.22%> (-0.01%) :arrow_down:
java-21 63.13% <92.22%> (+0.01%) :arrow_up:
temurin 63.17% <92.22%> (+<0.01%) :arrow_up:
unittests 63.16% <92.22%> (+<0.01%) :arrow_up:
unittests1 55.52% <46.66%> (-0.01%) :arrow_down:
unittests2 34.03% <92.22%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Dec 03 '25 10:12 codecov-commenter