blaze icon indicating copy to clipboard operation
blaze copied to clipboard

Case sensitive in NativeHiveTableScanBase

Open Flyangz opened this issue 5 months ago • 1 comments

Describe the bug If a SQL query uses an uppercase column name (e.g., SELECT NAME FROM my_table) and the scan node is HiveTableScanExec, the requestedAttributes in HiveTableScanExec keeps the column name as "NAME". It then fails to match this with the actual column name "name" (in lowercase) in the following code. https://github.com/apache/auron/blob/208024d01019de0079f263020282420f32cb3508/spark-extension/src/main/scala/org/apache/spark/sql/hive/execution/auron/plan/NativeHiveTableScanBase.scala#L73 This scenario is uncommon in Auron, as NativeFileSourceScanBase is not frequently used. We encountered this because our internal version of Auron supports other data sources. Auron might run into this bug when using Paimon or in other future scenarios that utilize NativeFileSourceScanBase.

To Reproduce In vanilla Spark 3.2, we can see the HiveTableScanExec.requestedAttributes remain the "K" instead of "k".

spark.sql("drop table if exists test.my_table")
spark.sql(
  """
    |create table test.my_table (
    |    k string,
    |    v string
    |) stored as textfile
    |""".stripMargin)
spark.sql(
  """
    |INSERT INTO test.my_table VALUES('a', 'b')
    |""".stripMargin)
spark.sql("select K from test.my_table").show()

Expected behavior NativeFileSourceScanBase.nativeFileSchema can handle uppercase column name like NativeFileSourceScanBase's.

Flyangz avatar Nov 06 '25 06:11 Flyangz

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Dec 07 '25 02:12 github-actions[bot]