[BUG] Case sensitive=True config doesn't work as intended
What happened:
When setting sql.identifier.case_sensitive=True dask-sql still ends up converting identifiers to lowercase during the planning stage.
What you expected to happen:
Case sensitivity being honored when set to True which is the default.
This needs https://github.com/apache/arrow-datafusion/issues/5626 to go in after which the planner can be instantiated with the proper enable_ident_normalization value. https://github.com/dask-contrib/dask-sql/blob/11fda749d4dc4c729350a1a1e87c0ea631f11bce/dask_planner/src/sql.rs#L548
Minimal Complete Verifiable Example:
df = pd.DataFrame({"id": [0, 1], "VAL": [1, 2]})
c.create_table("test", df)
c.sql("select ID from test") # works even though ID should be treated differently to `id`
c.sql("select val from test") # fails since the schema has `VAL` but the planner is looking for `val`.
Another instance of this reported by @goodwanghan in https://github.com/dask-contrib/dask-sql/issues/1108#issue-1653305345
import pandas as pd
import dask.dataframe as dd
from dask_sql import Context
df = pd.read_parquet("https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2023-01.parquet")
ctx = Context()
res = ctx.sql(
"SELECT PULocationID, COUNT(*) AS ct FROM df GROUP BY PULocationID ORDER BY ct DESC LIMIT 5",
dataframes={"df":dd.from_pandas(df,npartitions=2)},
config_options={"sql.identifier.case_sensitive":True}
)