dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] Capital letters does not work for table references

Open baturayo opened this issue 3 years ago • 3 comments

As seen in the documentation, the following code works.

SELECT
     lhs.name, lhs.id, lhs.x
  FROM
     data AS lhs

However, if I write lhs like LHS and write the query with an uppercase reference like below;

SELECT
     LHS.name, LHS.id, LHS.x
  FROM
     data AS LHS

Then, I get the following invalid qualifier error;

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/baturayofluoglu/workspace/soda-core/.venv/lib/python3.9/site-packages/dask_sql/context.py", line 493, in sql
    rel, _ = self._get_ral(sql)
  File "/Users/baturayofluoglu/workspace/soda-core/.venv/lib/python3.9/site-packages/dask_sql/context.py", line 811, in _get_ral
    raise ParsingException(sql, str(pe)) from None
dask_sql.utils.ParsingException: Plan("Invalid qualifier LHS")

Environment:

  • dask-sql version: 2022.10.1
  • Python version: 3.9
  • Operating System: macos
  • Install method: pip

baturayo avatar Nov 13 '22 21:11 baturayo

Would it be possible to add a code example to reproduce this issue locally? Doing the following, I wasn't able to reproduce the error:

mamba create -n caps-table-reference python=3.9
mamba activate caps-table-reference
pip install dask-sql==2022.10.1
from dask_sql import Context
from dask.datasets import timeseries

c = Context()
c.create_table("data", timeseries())

c.sql("""
    SELECT
        LHS.name, LHS.id, LHS.x
    FROM
        data AS LHS
""")

charlesbluca avatar Nov 14 '22 17:11 charlesbluca

Sorry for the misleading description. Indeed, my example in the description works. However, the following code raises the error that I mentioned above.

from dask_sql import Context
from dask.datasets import timeseries

c = Context()
df = dask.datasets.timeseries().reset_index()
c.create_table("timeseries", df)
c.sql("""
    SELECT
        LHS.*
    FROM
        timeseries AS LHS
""").compute()

If I write lhs.* then it works as shown below

c.sql("""
    SELECT
        lhs.*
    FROM
        timeseries AS LHS
""").compute()

baturayo avatar Nov 14 '22 23:11 baturayo

Thanks for the reproducer! It looks like the underlying issue here is in DataFusion, which specifically for wildcard expressions seems to be parsing the qualifier as case-sensitive, though the table reference is case-insensitive, hence the behavior above:

$ datafusion-cli 
DataFusion CLI v14.0.0
❯ CREATE EXTERNAL TABLE foo STORED AS CSV LOCATION 'data.csv';
0 rows in set. Query took 0.003 seconds.
❯ select df.* from foo as df;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
1 row in set. Query took 0.002 seconds.
❯ select df.* from foo as DF;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
❯ select DF.* from foo as DF;
Plan("Invalid qualifier DF")

Looks like we'll either want to look at fields_with_qualified, which checks for fields prefixed with the qualifier, or DFField, the struct storing the case-sensitive qualifier.

cc @andygrove in case you know of any existing issues looking into this

charlesbluca avatar Nov 15 '22 04:11 charlesbluca