pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add a framework to validate each of the Ingestion Transformation functions

Open deepthi912 opened this issue 3 months ago • 1 comments

This PR adds a validation framework for Pinot transform functions used in ingestion configs. It provides validation hooks for datatype checks in the TransformFunction interface that individual functions can implement to validate their configurations during table creation.

We can include validationMode in the transform function specification:

  1. LEGACY Mode (Default) Purpose: Backward compatibility - allows all existing type conversions Behavior: No validation, accepts everything (current Pinot behavior) Use Case: Existing tables that shouldn't break
  2. LENIENT Mode (Recommended) Purpose: Safe type conversions allowed Behavior: Allows safe conversions like INT→LONG, FLOAT→DOUBLE, but blocks unsafe ones like STRING→INT Use Case: New tables that want some safety but flexibility
  3. STRICT Mode Purpose: Maximum type safety Behavior: No automatic type conversions, exact type matching required Use Case: Critical tables where type safety is paramount
{
  "columnName": "processed_age",
  "transformFunction": "CAST(age_string AS INT)",
  "validationMode": "STRICT"
}

deepthi912 avatar Oct 18 '25 03:10 deepthi912

Codecov Report

:x: Patch coverage is 52.63158% with 9 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 63.45%. Comparing base (1b6866d) to head (3b4410f). :warning: Report is 237 commits behind head on master.

Files with missing lines Patch % Lines
...operator/transform/function/TransformFunction.java 0.00% 7 Missing :warning:
...ot/spi/config/table/ingestion/IngestionConfig.java 33.33% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17039      +/-   ##
============================================
+ Coverage     56.42%   63.45%   +7.03%     
- Complexity      702     1419     +717     
============================================
  Files          2406     3084     +678     
  Lines        133681   182157   +48476     
  Branches      21260    27953    +6693     
============================================
+ Hits          75424   115596   +40172     
- Misses        51983    57647    +5664     
- Partials       6274     8914    +2640     
Flag Coverage Δ
custom-integration1 100.00% <ø> (?)
integration 100.00% <ø> (+100.00%) :arrow_up:
integration1 100.00% <ø> (?)
integration2 0.00% <ø> (ø)
java-11 63.42% <52.63%> (+7.04%) :arrow_up:
java-21 63.42% <52.63%> (+7.02%) :arrow_up:
temurin 63.45% <52.63%> (+7.03%) :arrow_up:
unittests 63.45% <52.63%> (+7.03%) :arrow_up:
unittests1 56.30% <47.36%> (-0.13%) :arrow_down:
unittests2 33.61% <52.63%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Oct 18 '25 03:10 codecov-commenter