dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[ENH] Add support for non-UTC timezone-aware scalars

Open charlesbluca opened this issue 2 years ago • 0 comments

Is your feature request related to a problem? Please describe. While working on #1184, I noticed that we were getting a misleading test pass on test_filter_cast_timestamp, and that currently we don't have support for things like

import pandas as pd
from dask_sql import Context

c = Context()
c.create_table(
    "df",
    pd.DataFrame({
        "d": pd.date_range(start="2014-08-01 09:00", freq="8H", periods=6, tz="Europe/Berlin")
    })
)

c.sql("select * from df where d >= timestamp '2014-08-01 23:00:00+00'")

Because this would require us to support timezone-aware datetime scalars other than UTC, which is currently unsupported.

Describe the solution you'd like Two potential solutions I think might work here:

  • adding support for non-UTC datetimes on the Python end
  • making modifications to the construction of the logical plan such that in situations like this, non-UTC datetime columns are localized to some common scalar-supported timezone; for example, the offending plan above looks like:
TableScan: df projection=[d], full_filters=[df.d >= TimestampNanosecond(1406934000000000000, Some("Europe/Berlin"))]

But would pass if we were to change it to something like:

TableScan: df projection=[d], full_filters=[CAST(df.d AS Timestamp(Nanosecond, None)) >= TimestampNanosecond(1406934000000000000, None)]

charlesbluca avatar Jun 30 '23 20:06 charlesbluca