datafusion-comet
datafusion-comet copied to clipboard
fix: remove fallback for int8 and int16 types with native Parquet readers
Which issue does this PR close?
Closes #2195, #1348.
Rationale for this change
We previously fell back in a scenario where the native reader produced different results. We've now picked up appropriate DF and arrow-rs dependencies to have apache/arrow#7055.
What changes are included in this PR?
Remove the fallback.
How are these changes tested?
Existing tests.
Running Spark SQL tests locally with iceberg_compat. If that passes, I'll likely want to undo #1376 and try again.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 45.92%. Comparing base (f09f8af) to head (d401b98).
:warning: Report is 691 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #2197 +/- ##
=============================================
- Coverage 56.12% 45.92% -10.20%
- Complexity 976 1219 +243
=============================================
Files 119 164 +45
Lines 11743 13989 +2246
Branches 2251 2369 +118
=============================================
- Hits 6591 6425 -166
- Misses 4012 6525 +2513
+ Partials 1140 1039 -101
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
You may want to undo parts of https://github.com/apache/datafusion-comet/pull/1376 to really test the change. The changes to CometTestBase.getPrimitiveTypesParquetSchema were so that the schema would not contain unsigned types if the reader type was one of the datafusion based readers. Undoing that change will be a primary test.
So instead of getting null on the incorrectly written values we now get different values. So garbage in is still garbage out, but it's new garbage!
I seem to recall that I tested a badly written file with https://github.com/apache/arrow/issues/7055 and did not get garbage. I'll take a look at it again.
Thanks for kicking off this effort @mbutrovich!