[BUG] Capital letters does not work for table references
As seen in the documentation, the following code works.
SELECT
lhs.name, lhs.id, lhs.x
FROM
data AS lhs
However, if I write lhs like LHS and write the query with an uppercase reference like below;
SELECT
LHS.name, LHS.id, LHS.x
FROM
data AS LHS
Then, I get the following invalid qualifier error;
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/baturayofluoglu/workspace/soda-core/.venv/lib/python3.9/site-packages/dask_sql/context.py", line 493, in sql
rel, _ = self._get_ral(sql)
File "/Users/baturayofluoglu/workspace/soda-core/.venv/lib/python3.9/site-packages/dask_sql/context.py", line 811, in _get_ral
raise ParsingException(sql, str(pe)) from None
dask_sql.utils.ParsingException: Plan("Invalid qualifier LHS")
Environment:
- dask-sql version: 2022.10.1
- Python version: 3.9
- Operating System: macos
- Install method: pip
Would it be possible to add a code example to reproduce this issue locally? Doing the following, I wasn't able to reproduce the error:
mamba create -n caps-table-reference python=3.9
mamba activate caps-table-reference
pip install dask-sql==2022.10.1
from dask_sql import Context
from dask.datasets import timeseries
c = Context()
c.create_table("data", timeseries())
c.sql("""
SELECT
LHS.name, LHS.id, LHS.x
FROM
data AS LHS
""")
Sorry for the misleading description. Indeed, my example in the description works. However, the following code raises the error that I mentioned above.
from dask_sql import Context
from dask.datasets import timeseries
c = Context()
df = dask.datasets.timeseries().reset_index()
c.create_table("timeseries", df)
c.sql("""
SELECT
LHS.*
FROM
timeseries AS LHS
""").compute()
If I write lhs.* then it works as shown below
c.sql("""
SELECT
lhs.*
FROM
timeseries AS LHS
""").compute()
Thanks for the reproducer! It looks like the underlying issue here is in DataFusion, which specifically for wildcard expressions seems to be parsing the qualifier as case-sensitive, though the table reference is case-insensitive, hence the behavior above:
$ datafusion-cli
DataFusion CLI v14.0.0
❯ CREATE EXTERNAL TABLE foo STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.003 seconds.
❯ select df.* from foo as df;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
1 row in set. Query took 0.002 seconds.
❯ select df.* from foo as DF;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
❯ select DF.* from foo as DF;
Plan("Invalid qualifier DF")
Looks like we'll either want to look at fields_with_qualified, which checks for fields prefixed with the qualifier, or DFField, the struct storing the case-sensitive qualifier.
cc @andygrove in case you know of any existing issues looking into this