cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[BUG]: Incorrect result for `rolling` with experimental streaming executor and multiple partitions

Open TomAugspurger opened this issue 7 months ago • 3 comments

Describe the bug

The test python/cudf_polars/tests/expressions/test_rolling.py::test_rolling_datetime fails with a small blocksize.

Steps/Code to reproduce bug

import polars as pl
from cudf_polars.testing.asserts import assert_gpu_result_equal

dates = [
    "2020-01-01 13:45:48",
    "2020-01-01 16:42:13",
    "2020-01-01 16:45:09",
    "2020-01-02 18:12:48",
    "2020-01-03 19:45:32",
    "2020-01-08 23:16:43",
]
df = (
    pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]})
    .with_columns(pl.col("dt").str.strptime(pl.Datetime("ns")))
    .lazy()
)
q = df.with_columns(pl.sum("a").rolling(index_column="dt", period="2d"))
assert_gpu_result_equal(q, engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 3}))

fails with

AssertionError: DataFrames are different (value mismatch for column 'a')
[left]:  [3, 10, 15, 24, 11, 1]
[right]: [3, 10, 15, 9, 11, 1]

Expected behavior

No exception

TomAugspurger avatar Jun 12 '25 21:06 TomAugspurger

This probably affects window operations too, e.g. those in python/cudf_polars/tests/test_window_functions.py::test_rolling[agg_expr0-2d]

TomAugspurger avatar Jun 12 '25 21:06 TomAugspurger

This should just concat first and then run "in-memory" (because we haven't implemented partitioned rolling). I wonder if rolling is somehow not being concatted and lowered?

wence- avatar Jun 13 '25 16:06 wence-

I wonder if rolling is somehow not being concatted and lowered?

We are not scrutinizing HStack.columns the way we are for Select.exprs - This is definitely a bug, but shouldn't be hard to fix.

rjzamora avatar Jun 13 '25 19:06 rjzamora