datafusion-comet
datafusion-comet copied to clipboard
`native_datafusion/native_iceberg_compat` scans case sensitive
Describe the bug
Currently native_datafusion/native_iceberg_compat scans are case-sensitive, which may be inconsistent with vanilla spark.
test case:
test("test V1 parquet scan uses native_iceberg_compat -- case insensitive") {
withTempPath { path =>
spark.range(10).toDF("a").write.parquet(path.toString)
Seq(CometConf.SCAN_NATIVE_DATAFUSION, CometConf.SCAN_NATIVE_ICEBERG_COMPAT).foreach(scanMode => {
withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> scanMode) {
sql("create table test (A long) using parquet options (path '" + path + "')")
val df = sql("select A from test")
checkSparkAnswer(df)
}
})
}
}
error:
== Results ==
!== Correct Answer - 10 == == Spark Answer - 10 ==
struct<A:bigint> struct<A:bigint>
![0] [null]
![1] [null]
![2] [null]
![3] [null]
![4] [null]
![5] [null]
![6] [null]
![7] [null]
![8] [null]
![9] [null]
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Hi @wForget do you plan to follow up on this ticket? Thanks
mind if I take that, already investigating https://github.com/apache/datafusion-comet/issues/1681 and I think it has similar problems
mind if I take that, already investigating #1681 and I think it has similar problems
Thank you, feel free to send pr
Isn't this addressed by #1575 ?
Isn't this addressed by #1575 ?
As commented in https://github.com/apache/datafusion-comet/pull/1575#discussion_r2016533206, the schema adapter is not applied to pushed down filter (row group filter), so the fitler is still case sensitive.
https://github.com/apache/datafusion/blob/18feb8b2702b96a8a77ec4bc52fb67571e857d4d/datafusion/datasource-parquet/src/opener.rs#L185-L218
These Spark SQL test failures with native_iceberg_compat are possible related to this issue:
- Spark native readers should respect spark.sql.caseSensitive - parquet *** FAILED *** (440 milliseconds)
- SPARK-31116: Select nested schema with case insensitive mode *** FAILED *** (695 milliseconds)