Add post-partial-upsert transforms for derived columns
This pull request adds support for applying post-update transform functions after partial upsert merges in Pinot. This enables derived columns to be recomputed based on the latest merged row, enhancing the flexibility and correctness of partial upsert tables. The changes introduce a new mechanism to configure and execute these post-upsert transforms, update the core upsert logic to invoke them, and add tests to validate the new behavior.
Partial Upsert Post-Update Transform Support
- Added a new method
getPostPartialUpsertTransformersinRecordTransformerUtilsto create a list of transformers that are applied after a partial upsert merge, based on new post-update transform configs inUpsertConfig. - Updated
PartialUpsertHandlerto acceptTableConfigand initialize post-update transformers, and to apply these transformers to the merged row after the upsert merge is performed. [1] [2] [3]
Expression Transformer Enhancements
- Modified
ExpressionTransformerto support anoverwriteExistingValuesmode, which allows transforms to run even if the column already has a value—necessary for post-upsert transforms. [1] [2] [3] - Improved null value handling: when a transform produces a null, the field is now explicitly marked as a null value in the row. [1] [2]
Testing
- Added a new integration test
testPartialUpsertPostUpdateTransformsto verify that post-upsert transforms correctly recompute derived columns after partial upsert merges. - Added unit tests to ensure the correct marking of null values when transforms produce nulls, both when the field is empty and when it already has a value.
These changes collectively make partial upsert tables more robust and allow for advanced use cases where derived columns must be kept up-to-date after merge operations.
Codecov Report
:x: Patch coverage is 92.22222% with 7 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 63.17%. Comparing base (7305eec) to head (77b5db3).
Additional details and impacted files
@@ Coverage Diff @@
## master #17308 +/- ##
=========================================
Coverage 63.16% 63.17%
Complexity 1479 1479
=========================================
Files 3173 3173
Lines 189917 190001 +84
Branches 29064 29089 +25
=========================================
+ Hits 119970 120025 +55
- Misses 60621 60654 +33
+ Partials 9326 9322 -4
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | 100.00% <ø> (ø) |
|
| integration | 100.00% <ø> (ø) |
|
| integration1 | 100.00% <ø> (ø) |
|
| integration2 | 0.00% <ø> (ø) |
|
| java-11 | 63.13% <92.22%> (-0.01%) |
:arrow_down: |
| java-21 | 63.13% <92.22%> (+0.01%) |
:arrow_up: |
| temurin | 63.17% <92.22%> (+<0.01%) |
:arrow_up: |
| unittests | 63.16% <92.22%> (+<0.01%) |
:arrow_up: |
| unittests1 | 55.52% <46.66%> (-0.01%) |
:arrow_down: |
| unittests2 | 34.03% <92.22%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.