[BUG]: Incorrect result for `rolling` with experimental streaming executor and multiple partitions
Describe the bug
The test python/cudf_polars/tests/expressions/test_rolling.py::test_rolling_datetime fails
with a small blocksize.
Steps/Code to reproduce bug
import polars as pl
from cudf_polars.testing.asserts import assert_gpu_result_equal
dates = [
"2020-01-01 13:45:48",
"2020-01-01 16:42:13",
"2020-01-01 16:45:09",
"2020-01-02 18:12:48",
"2020-01-03 19:45:32",
"2020-01-08 23:16:43",
]
df = (
pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]})
.with_columns(pl.col("dt").str.strptime(pl.Datetime("ns")))
.lazy()
)
q = df.with_columns(pl.sum("a").rolling(index_column="dt", period="2d"))
assert_gpu_result_equal(q, engine=pl.GPUEngine(executor="streaming", executor_options={"max_rows_per_partition": 3}))
fails with
AssertionError: DataFrames are different (value mismatch for column 'a')
[left]: [3, 10, 15, 24, 11, 1]
[right]: [3, 10, 15, 9, 11, 1]
Expected behavior
No exception
This probably affects window operations too, e.g. those in python/cudf_polars/tests/test_window_functions.py::test_rolling[agg_expr0-2d]
This should just concat first and then run "in-memory" (because we haven't implemented partitioned rolling). I wonder if rolling is somehow not being concatted and lowered?
I wonder if rolling is somehow not being concatted and lowered?
We are not scrutinizing HStack.columns the way we are for Select.exprs - This is definitely a bug, but shouldn't be hard to fix.