pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add single column transformer

Open krishan1390 opened this issue 3 months ago • 1 comments

  1. ColumnReaderTransformer wraps a columnReader and applies transform functions efficiently. It avoids Object conversion unless transformation is neccessary
  2. Created ColumnTransformer interface and 2 implementations - DataTypeColumnTransformer and NullValueColumnTransformer
  3. Modify DefaultValueColumnReader to always return null so that NullValueColumnTransformer takes effect. Clients need to use ColumnReaderTransformer as the wrapper for this

Objective is for above interfaces to handle single column transformations.

Multi column transformations needs to be done separately via different interface.

This depends on https://github.com/apache/pinot/pull/17293 for new methods in the ColumnReader interface

krishan1390 avatar Dec 02 '25 11:12 krishan1390

Codecov Report

:x: Patch coverage is 89.85507% with 14 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.25%. Comparing base (95d43c0) to head (94d7228). :warning: Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...l/columntransformer/DataTypeColumnTransformer.java 83.78% 2 Missing and 4 partials :warning:
.../segment/local/utils/DataTypeTransformerUtils.java 90.32% 3 Missing and 3 partials :warning:
.../columntransformer/NullValueColumnTransformer.java 85.71% 0 Missing and 1 partial :warning:
...segment/local/utils/NullValueTransformerUtils.java 96.00% 0 Missing and 1 partial :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17304      +/-   ##
============================================
- Coverage     63.28%   63.25%   -0.03%     
  Complexity     1474     1474              
============================================
  Files          3154     3158       +4     
  Lines        188007   188056      +49     
  Branches      28782    28793      +11     
============================================
- Hits         118977   118957      -20     
- Misses        59807    59879      +72     
+ Partials       9223     9220       -3     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.23% <89.85%> (-0.04%) :arrow_down:
java-21 63.22% <89.85%> (+<0.01%) :arrow_up:
temurin 63.25% <89.85%> (-0.03%) :arrow_down:
unittests 63.25% <89.85%> (-0.03%) :arrow_down:
unittests1 55.65% <47.10%> (-0.04%) :arrow_down:
unittests2 33.93% <89.13%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Dec 23 '25 08:12 codecov-commenter

Please edit the PR description. It refers to the changes that were present in the previous commits.

9aman avatar Dec 25 '25 08:12 9aman

@krishan1390, can you add a plan how all the transformers will be supported in the future, current implementation is too narrow in scope and doesn't discuss long-term usability.

Also, give a brief description on the flows that will be affected/utilise the columnTransformer and if this is configurable? Segments will eventually be loaded on servers and server can detect to reprocess a segment based on the latest schema and config so it should be safe to opt out of data type transformation and be a config in segment builder.

rohityadav1993 avatar Dec 29 '25 07:12 rohityadav1993