datafusion-comet
datafusion-comet copied to clipboard
Parquet scan NATIVE_DATAFUSION and NATIVE_ICEBERG_COMPAT fail to read uint8, uint16 negative values correctly
Describe the bug
Multiple unit tests fail because the scan is returning nulls for values that represent negative values in an int32. This is most likely due to https://github.com/apache/arrow-rs/issues/7040 which has more details.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
The original value written by Comet writes illegal values due to https://github.com/apache/parquet-java/issues/3142
We currently fall back to Spark if the schema contains byte or short unless a Comet config is enabled, so this isn't an urgent bug to fix.
We don't have any progress on this in the community, so may remain like this for a while. The general consensus seemed to have been towards not changing anything (because the original test file was badly formed).