datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

`native_datafusion/native_iceberg_compat` scans case sensitive

Open wForget opened this issue 8 months ago • 6 comments

Describe the bug

Currently native_datafusion/native_iceberg_compat scans are case-sensitive, which may be inconsistent with vanilla spark.

test case:

  test("test V1 parquet scan uses native_iceberg_compat -- case insensitive") {
    withTempPath { path =>
      spark.range(10).toDF("a").write.parquet(path.toString)
      Seq(CometConf.SCAN_NATIVE_DATAFUSION, CometConf.SCAN_NATIVE_ICEBERG_COMPAT).foreach(scanMode => {
        withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> scanMode) {
          sql("create table test (A long) using parquet options (path '" + path + "')")
          val df = sql("select A from test")
          checkSparkAnswer(df)
        }
      })
    }
  }

error:

== Results ==
!== Correct Answer - 10 ==   == Spark Answer - 10 ==
 struct<A:bigint>            struct<A:bigint>
![0]                         [null]
![1]                         [null]
![2]                         [null]
![3]                         [null]
![4]                         [null]
![5]                         [null]
![6]                         [null]
![7]                         [null]
![8]                         [null]
![9]                         [null]

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

wForget avatar Mar 27 '25 09:03 wForget

Hi @wForget do you plan to follow up on this ticket? Thanks

kazuyukitanimura avatar Apr 08 '25 20:04 kazuyukitanimura

Hi @wForget do you plan to follow up on this ticket? Thanks

I will continue to try to fix this

wForget avatar Apr 09 '25 12:04 wForget

mind if I take that, already investigating https://github.com/apache/datafusion-comet/issues/1681 and I think it has similar problems

comphead avatar May 06 '25 23:05 comphead

mind if I take that, already investigating #1681 and I think it has similar problems

Thank you, feel free to send pr

wForget avatar May 07 '25 02:05 wForget

Isn't this addressed by #1575 ?

parthchandra avatar May 07 '25 21:05 parthchandra

Isn't this addressed by #1575 ?

As commented in https://github.com/apache/datafusion-comet/pull/1575#discussion_r2016533206, the schema adapter is not applied to pushed down filter (row group filter), so the fitler is still case sensitive.

https://github.com/apache/datafusion/blob/18feb8b2702b96a8a77ec4bc52fb67571e857d4d/datafusion/datasource-parquet/src/opener.rs#L185-L218

wForget avatar May 08 '25 02:05 wForget

These Spark SQL test failures with native_iceberg_compat are possible related to this issue:

  • Spark native readers should respect spark.sql.caseSensitive - parquet *** FAILED *** (440 milliseconds)
  • SPARK-31116: Select nested schema with case insensitive mode *** FAILED *** (695 milliseconds)

andygrove avatar May 23 '25 16:05 andygrove