polars icon indicating copy to clipboard operation
polars copied to clipboard

group_by_dynamic with offset computes wrong window starts when a DST time change happens just before the 1st window

Open michelbl opened this issue 3 months ago • 2 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

from datetime import datetime, timedelta, UTC

import polars as pl

print(
    pl.DataFrame(
        data={
            "t": pl.Series(
                [
                    datetime(2023, 3, 26, 14, 56, tzinfo=UTC),
                    datetime(2023, 3, 27, 14, 56, tzinfo=UTC),
                ]
            )
            .dt.cast_time_unit("ms")
            .dt.convert_time_zone("Europe/Paris"),
            "q": [4, 5],
        }
    )
    .set_sorted("t")
    .group_by_dynamic(index_column="t", every="1d", offset=timedelta(hours=6))
    .agg([pl.sum("q").alias("q")])
)

print(
    pl.DataFrame(
        data={
            "t": pl.Series(
                [
                    datetime(2023, 10, 29, 14, 56, tzinfo=UTC),
                    datetime(2023, 10, 30, 14, 56, tzinfo=UTC),
                ]
            )
            .dt.cast_time_unit("ms")
            .dt.convert_time_zone("Europe/Paris"),
            "q": [4, 5],
        }
    )
    .set_sorted("t")
    .group_by_dynamic(index_column="t", every="1d", offset=timedelta(hours=6))
    .agg([pl.sum("q").alias("q")])
)

Log output

No output

Issue description

When a DST time change ("spring forward" or "fall back") happens between midnight and the first window start, then all the window starts are not offset correctly.

shape: (2, 2)
┌────────────────────────────┬─────┐
│ t                          ┆ q   │
│ ---                        ┆ --- │
│ datetime[ms, Europe/Paris] ┆ i64 │
╞════════════════════════════╪═════╡
│ 2023-03-26 07:00:00 CEST   ┆ 4   │
│ 2023-03-27 07:00:00 CEST   ┆ 5   │
└────────────────────────────┴─────┘
shape: (2, 2)
┌────────────────────────────┬─────┐
│ t                          ┆ q   │
│ ---                        ┆ --- │
│ datetime[ms, Europe/Paris] ┆ i64 │
╞════════════════════════════╪═════╡
│ 2023-10-29 05:00:00 CET    ┆ 4   │
│ 2023-10-30 05:00:00 CET    ┆ 5   │
└────────────────────────────┴─────┘

Expected behavior

shape: (2, 2)
┌────────────────────────────┬─────┐
│ t                          ┆ q   │
│ ---                        ┆ --- │
│ datetime[ms, Europe/Paris] ┆ i64 │
╞════════════════════════════╪═════╡
│ 2023-03-26 06:00:00 CEST   ┆ 4   │
│ 2023-03-27 06:00:00 CEST   ┆ 5   │
└────────────────────────────┴─────┘
shape: (2, 2)
┌────────────────────────────┬─────┐
│ t                          ┆ q   │
│ ---                        ┆ --- │
│ datetime[ms, Europe/Paris] ┆ i64 │
╞════════════════════════════╪═════╡
│ 2023-10-29 06:00:00 CET    ┆ 4   │
│ 2023-10-30 06:00:00 CET    ┆ 5   │
└────────────────────────────┴─────┘

Installed versions

--------Version info---------
Polars:               0.20.23
Index type:           UInt32
Platform:             Linux-6.5.0-28-generic-x86_64-with-glibc2.35
Python:               3.11.4 (main, Jun 26 2023, 15:13:33) [GCC 11.3.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.10.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
nest_asyncio:         1.5.8
numpy:                1.26.2
openpyxl:             3.1.2
pandas:               2.1.3
pyarrow:              11.0.0
pydantic:             2.5.1
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.23
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

michelbl avatar Apr 30 '24 08:04 michelbl