dask-sql
dask-sql copied to clipboard
[BUG] index name "timestamp" not recognized in SQL
The following example has a dataframe with an index named "timestamp". If you want to use the column in a WHERE clause in SQL there is an error.
import dask
from dask_sql import Context
df = dask.datasets.timeseries()
df.head()
c = Context()
c.create_table("mytab", df)
result = c.sql("""
SELECT count(*)
FROM mytab
WHERE "timestamp" > '2000-01-01 00:00:00'
""")
print(result.compute())
The Error Message is:
Traceback (most recent call last):
File "/opt/dask_sql/startup_script.py", line 15, in <module>
result = c.sql("""
File "/opt/dask_sql/dask_sql/context.py", line 458, in sql
rel, select_names, _ = self._get_ral(sql)
File "/opt/dask_sql/dask_sql/context.py", line 892, in _get_ral
raise ParsingException(sql, str(e.message())) from None
dask_sql.utils.ParsingException: Can not parse the given SQL: From line 4, column 15 to line 4, column 25: Column 'timestamp' not found in any table
The problem is probably somewhere here:
SELECT count(*)
FROM timeseries
WHERE "timestamp" > '2000-01-01'
^^^^^^^^^^^
I can "fix" this situation by renaming the index
df = df.reset_index().rename(columns={"timestamp": "tsp"}).set_index("tsp")
c.create_table("mytab", df)
result = c.sql("""
SELECT count(*)
FROM mytab
WHERE "tsp" > '2000-01-01 00:00:00'
""")
print(result.compute())
This still results in an error message, but just related to a datatype mismatch:
TypeError('Invalid comparison between dtype=datetime64[ns] and datetime')
For the examples above I used this docker image nbraun/dask-sql:2022.1.0.