polars
polars copied to clipboard
Closed argument not working as expected on rolling aggregations
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
When using the closed ='left' argument on rolling.mean() in combination with .over() the result is not shifted at all.
Reproducible example
import polars as pl
import numpy as np
ids = np.random.randint(0,10,100)
values = np.random.random_sample(100)
df = pl.DataFrame({'id':ids,'value':values})
df.with_column(
(pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
)
Expected behavior
One would expect the result to be like below, with the values being shifted by 1. This is how it works for groupby_rolling(), the below code gets the expected result:
df.with_column(
(pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
).with_column(
pl.col('value_id_100_mean').shift(1).over('id'))
Installed versions
---Version info---
Polars: 0.15.13
Index type: UInt32
Platform: Linux-5.4.0-122-generic-x86_64-with-glibc2.31
Python: 3.9.13 (main, May 23 2022, 22:01:06)
[GCC 9.4.0]
---Optional dependencies---
pyarrow: 8.0.0
pandas: 1.4.3
numpy: 1.23.1
fsspec: 2022.5.0
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.5.2
I don't understand what the over has to to with this? The result is also not shifted in a normal rolling_mean right?
Can someone confirm that we do something incorrect here? I don't see the problem atm.
@ML-BCB : I'm trying to understand your example. Could you:
- Make the dummy data much smaller, and not using random functions but just hard coded values? That makes it easier to follow. For example, my guess is you could show the problem with like 5 rows in a dataframe, no?
- Please specify what the output is? Your expected output is like the original, and then shifted?
I can't find anything wrong with the over here. For example, compare with the outcome where we first filter on the id column:
>>> df.with_columns(
... (pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
... ).filter(pl.col("id")==0)
shape: (11, 3)
┌─────┬──────────┬───────────────────┐
│ id ┆ value ┆ value_id_100_mean │
│ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 │
╞═════╪══════════╪═══════════════════╡
│ 0 ┆ 0.875956 ┆ 0.875956 │
│ 0 ┆ 0.428889 ┆ 0.652422 │
│ 0 ┆ 0.906453 ┆ 0.737099 │
│ 0 ┆ 0.014574 ┆ 0.556468 │
│ 0 ┆ 0.379922 ┆ 0.521159 │
│ 0 ┆ 0.091058 ┆ 0.449475 │
│ 0 ┆ 0.788313 ┆ 0.497881 │
│ 0 ┆ 0.175671 ┆ 0.457604 │
│ 0 ┆ 0.330331 ┆ 0.443463 │
│ 0 ┆ 0.280692 ┆ 0.427186 │
│ 0 ┆ 0.808958 ┆ 0.461892 │
└─────┴──────────┴───────────────────┘
df.filter(pl.col("id")==0).with_columns(pl.col("value").rolling_mean(100,min_periods=1,closed='left'))
yields the same.
Closing this given no response, and no one able to reproduce. If you run into this again, feel free to open a new issue.