pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Infer parquet reader type based on file metadata (wip)

Open saurabhd336 opened this issue 3 years ago • 2 comments

Instructions:

  1. The PR has to be tagged with at least one of the following labels (*):
    1. feature
    2. bugfix
    3. performance
    4. ui
    5. backward-incompat
    6. release-notes (**)
  2. Remove these instructions before publishing the PR.

(*) Other labels to consider:

  • testing
  • dependencies
  • docker
  • kubernetes
  • observability
  • security
  • code-style
  • extension-point
  • refactor
  • cleanup

(**) Use release-notes label for scenarios like:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed

saurabhd336 avatar Aug 29 '22 17:08 saurabhd336

Can you add a sample data file with a decimal field and a test to ensure the file is correctly parsed?

xiangfu0 avatar Aug 30 '22 06:08 xiangfu0

Codecov Report

Merging #9294 (ed4d8e4) into master (d1a71d8) will decrease coverage by 2.78%. The diff coverage is 46.56%.

:exclamation: Current head ed4d8e4 differs from pull request most recent head 711add8. Consider uploading reports for the commit 711add8 to get more accurate results

@@             Coverage Diff              @@
##             master    #9294      +/-   ##
============================================
- Coverage     69.82%   67.04%   -2.79%     
- Complexity     4696     4824     +128     
============================================
  Files          1873     1391     -482     
  Lines         99623    72184   -27439     
  Branches      15146    11583    -3563     
============================================
- Hits          69564    48396   -21168     
+ Misses        25118    20266    -4852     
+ Partials       4941     3522    -1419     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.04% <46.56%> (-0.08%) :arrow_down:
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...aggregation/function/AggregationFunctionUtils.java 80.00% <ø> (+0.51%) :arrow_up:
...not/query/planner/logical/RelToStageConverter.java 73.91% <ø> (ø)
.../columnminmaxvalue/ColumnMinMaxValueGenerator.java 73.68% <0.00%> (ø)
...a/org/apache/pinot/segment/spi/ColumnMetadata.java 80.00% <0.00%> (-20.00%) :arrow_down:
...java/org/apache/pinot/segment/spi/V1Constants.java 12.50% <ø> (ø)
...ator/transform/function/BaseTransformFunction.java 46.58% <20.00%> (-4.92%) :arrow_down:
...java/org/apache/pinot/core/common/DataFetcher.java 77.81% <42.30%> (-11.93%) :arrow_down:
...e/pinot/core/transport/InstanceRequestHandler.java 60.15% <60.00%> (-4.69%) :arrow_down:
.../java/org/apache/pinot/query/QueryEnvironment.java 82.75% <80.00%> (-5.00%) :arrow_down:
...ator/transform/function/CastTransformFunction.java 84.00% <100.00%> (+13.35%) :arrow_up:
... and 735 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Aug 30 '22 09:08 codecov-commenter

The test failure might be related:

2022-08-31T07:17:28.7248119Z [ERROR] Failures: 
2022-08-31T07:17:28.7249092Z [ERROR]   ParquetRecordReaderTest.testComparison:105->testComparison:125 expected [false] but found [true]

Jackie-Jiang avatar Aug 31 '22 22:08 Jackie-Jiang

The test failure might be related:

2022-08-31T07:17:28.7248119Z [ERROR] Failures: 
2022-08-31T07:17:28.7249092Z [ERROR]   ParquetRecordReaderTest.testComparison:105->testComparison:125 expected [false] but found [true]

@Jackie-Jiang ACK. Fixed the test and added a new test to validate file metadata based reader selection

saurabhd336 avatar Sep 01 '22 05:09 saurabhd336

Can you please modify the PR description to include the new config key for the record reader config? Also update the Pinot doc where applicable

Jackie-Jiang avatar Sep 01 '22 18:09 Jackie-Jiang