pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Set skipUnparseableLines to true by default in CSV reader

Open KKcorps opened this issue 1 year ago • 1 comments

Needed to ensure we don't fail in .hasNext() when CSV has corrupt records

KKcorps avatar Oct 25 '24 12:10 KKcorps

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 55.44%. Comparing base (59551e4) to head (2819f23). Report is 1245 commits behind head on master.

:exclamation: There is a different number of reports uploaded between BASE (59551e4) and HEAD (2819f23). Click for more details.

HEAD has 7 uploads less than BASE
Flag BASE (59551e4) HEAD (2819f23)
skip-bytebuffers-false 7 6
unittests 5 3
java-11 5 4
unittests2 3 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14311      +/-   ##
============================================
- Coverage     61.75%   55.44%   -6.31%     
- Complexity      207      797     +590     
============================================
  Files          2436     2096     -340     
  Lines        133233   109872   -23361     
  Branches      20636    17377    -3259     
============================================
- Hits          82274    60920   -21354     
+ Misses        44911    44126     -785     
+ Partials       6048     4826    -1222     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) :arrow_up:
integration 100.00% <ø> (+99.99%) :arrow_up:
integration1 100.00% <ø> (+99.99%) :arrow_up:
integration2 0.00% <ø> (ø)
java-11 55.35% <100.00%> (-6.36%) :arrow_down:
java-21 55.29% <100.00%> (-6.33%) :arrow_down:
skip-bytebuffers-false 55.43% <100.00%> (-6.31%) :arrow_down:
skip-bytebuffers-true 55.22% <100.00%> (+27.50%) :arrow_up:
temurin 55.44% <100.00%> (-6.31%) :arrow_down:
unittests 55.44% <100.00%> (-6.31%) :arrow_down:
unittests1 55.44% <100.00%> (+8.55%) :arrow_up:
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 25 '24 13:10 codecov-commenter

This is backward incompatible change. Ideally continueOnError should be able to skip the record

Jackie-Jiang avatar Oct 27 '24 23:10 Jackie-Jiang

Do we need this after #14309?

Jackie-Jiang avatar Oct 27 '24 23:10 Jackie-Jiang

@Jackie-Jiang 14309 won't handle the case where .hasNext() itself fails. I understand this is backward incompatible. I think the main thing we want to do is use lineIterator by default instead of one from csv lib.

KKcorps avatar Oct 28 '24 11:10 KKcorps

I don't fully follow the change. When LineIterator is used, it can still throw exception on hasNext(). What is the main difference between these 2 approaches?

Jackie-Jiang avatar Oct 28 '24 19:10 Jackie-Jiang

@Jackie-Jiang the difference is this doesn't throw exception when the line is corrupt there still might be rare case when it fails but its much much safer than existing iterator

the existing one dies as soon as it comes across a line that it cannot parse, we ran into this error and it affected the preview

KKcorps avatar Oct 30 '24 03:10 KKcorps