datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Parquet scan NATIVE_DATAFUSION and NATIVE_ICEBERG_COMPAT fail to read uint8, uint16 negative values correctly

Open parthchandra opened this issue 10 months ago • 3 comments

Describe the bug

Multiple unit tests fail because the scan is returning nulls for values that represent negative values in an int32. This is most likely due to https://github.com/apache/arrow-rs/issues/7040 which has more details.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

parthchandra avatar Jan 28 '25 18:01 parthchandra

The original value written by Comet writes illegal values due to https://github.com/apache/parquet-java/issues/3142

parthchandra avatar Jan 31 '25 19:01 parthchandra

We currently fall back to Spark if the schema contains byte or short unless a Comet config is enabled, so this isn't an urgent bug to fix.

andygrove avatar Apr 17 '25 20:04 andygrove

We don't have any progress on this in the community, so may remain like this for a while. The general consensus seemed to have been towards not changing anything (because the original test file was badly formed).

parthchandra avatar Apr 17 '25 20:04 parthchandra