dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[BUG] index name "timestamp" not recognized in SQL

Open markope opened this issue 3 years ago • 0 comments

The following example has a dataframe with an index named "timestamp". If you want to use the column in a WHERE clause in SQL there is an error.

import dask
from dask_sql import Context

df = dask.datasets.timeseries()

df.head()

c = Context()
c.create_table("mytab", df)

result = c.sql("""
        SELECT count(*)
        FROM mytab
        WHERE "timestamp" > '2000-01-01 00:00:00'
    """)
print(result.compute())

The Error Message is:

Traceback (most recent call last):
  File "/opt/dask_sql/startup_script.py", line 15, in <module>
    result = c.sql("""
  File "/opt/dask_sql/dask_sql/context.py", line 458, in sql
    rel, select_names, _ = self._get_ral(sql)
  File "/opt/dask_sql/dask_sql/context.py", line 892, in _get_ral
    raise ParsingException(sql, str(e.message())) from None
dask_sql.utils.ParsingException: Can not parse the given SQL: From line 4, column 15 to line 4, column 25: Column 'timestamp' not found in any table

The problem is probably somewhere here:

	
	        SELECT count(*)
	        FROM timeseries
	        WHERE "timestamp" > '2000-01-01'
	              ^^^^^^^^^^^

I can "fix" this situation by renaming the index

df = df.reset_index().rename(columns={"timestamp": "tsp"}).set_index("tsp")

c.create_table("mytab", df)

result = c.sql("""
        SELECT count(*)
        FROM mytab
        WHERE "tsp" > '2000-01-01 00:00:00'
    """)
print(result.compute())

This still results in an error message, but just related to a datatype mismatch:

TypeError('Invalid comparison between dtype=datetime64[ns] and datetime')

For the examples above I used this docker image nbraun/dask-sql:2022.1.0.

markope avatar Mar 09 '22 18:03 markope