dask-sql
dask-sql copied to clipboard
[ENH] Add support for non-UTC timezone-aware scalars
Is your feature request related to a problem? Please describe.
While working on #1184, I noticed that we were getting a misleading test pass on test_filter_cast_timestamp, and that currently we don't have support for things like
import pandas as pd
from dask_sql import Context
c = Context()
c.create_table(
"df",
pd.DataFrame({
"d": pd.date_range(start="2014-08-01 09:00", freq="8H", periods=6, tz="Europe/Berlin")
})
)
c.sql("select * from df where d >= timestamp '2014-08-01 23:00:00+00'")
Because this would require us to support timezone-aware datetime scalars other than UTC, which is currently unsupported.
Describe the solution you'd like Two potential solutions I think might work here:
- adding support for non-UTC datetimes on the Python end
- making modifications to the construction of the logical plan such that in situations like this, non-UTC datetime columns are localized to some common scalar-supported timezone; for example, the offending plan above looks like:
TableScan: df projection=[d], full_filters=[df.d >= TimestampNanosecond(1406934000000000000, Some("Europe/Berlin"))]
But would pass if we were to change it to something like:
TableScan: df projection=[d], full_filters=[CAST(df.d AS Timestamp(Nanosecond, None)) >= TimestampNanosecond(1406934000000000000, None)]