polars icon indicating copy to clipboard operation
polars copied to clipboard

Closed argument not working as expected on rolling aggregations

Open ML-BCB opened this issue 2 years ago • 2 comments
trafficstars

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

When using the closed ='left' argument on rolling.mean() in combination with .over() the result is not shifted at all.

Reproducible example

import polars as pl
import numpy as np

ids = np.random.randint(0,10,100)
values = np.random.random_sample(100)
df = pl.DataFrame({'id':ids,'value':values})
df.with_column(
     (pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
)

Expected behavior

One would expect the result to be like below, with the values being shifted by 1. This is how it works for groupby_rolling(), the below code gets the expected result:

df.with_column(
     (pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
).with_column(
    pl.col('value_id_100_mean').shift(1).over('id'))

Installed versions

---Version info---
Polars: 0.15.13
Index type: UInt32
Platform: Linux-5.4.0-122-generic-x86_64-with-glibc2.31
Python: 3.9.13 (main, May 23 2022, 22:01:06) 
[GCC 9.4.0]
---Optional dependencies---
pyarrow: 8.0.0
pandas: 1.4.3
numpy: 1.23.1
fsspec: 2022.5.0
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.5.2

ML-BCB avatar Jan 06 '23 19:01 ML-BCB

I don't understand what the over has to to with this? The result is also not shifted in a normal rolling_mean right?

ritchie46 avatar Jan 09 '23 09:01 ritchie46

Can someone confirm that we do something incorrect here? I don't see the problem atm.

ritchie46 avatar Jan 19 '23 14:01 ritchie46

@ML-BCB : I'm trying to understand your example. Could you:

  1. Make the dummy data much smaller, and not using random functions but just hard coded values? That makes it easier to follow. For example, my guess is you could show the problem with like 5 rows in a dataframe, no?
  2. Please specify what the output is? Your expected output is like the original, and then shifted?

I can't find anything wrong with the over here. For example, compare with the outcome where we first filter on the id column:

>>> df.with_columns(
...      (pl.col('value').rolling_mean(100,min_periods=1,closed='left').over('id')).suffix('_id_100_mean')
... ).filter(pl.col("id")==0)
shape: (11, 3)
┌─────┬──────────┬───────────────────┐
│ id  ┆ value    ┆ value_id_100_mean │
│ --- ┆ ---      ┆ ---               │
│ i64 ┆ f64      ┆ f64               │
╞═════╪══════════╪═══════════════════╡
│ 0   ┆ 0.875956 ┆ 0.875956          │
│ 0   ┆ 0.428889 ┆ 0.652422          │
│ 0   ┆ 0.906453 ┆ 0.737099          │
│ 0   ┆ 0.014574 ┆ 0.556468          │
│ 0   ┆ 0.379922 ┆ 0.521159          │
│ 0   ┆ 0.091058 ┆ 0.449475          │
│ 0   ┆ 0.788313 ┆ 0.497881          │
│ 0   ┆ 0.175671 ┆ 0.457604          │
│ 0   ┆ 0.330331 ┆ 0.443463          │
│ 0   ┆ 0.280692 ┆ 0.427186          │
│ 0   ┆ 0.808958 ┆ 0.461892          │
└─────┴──────────┴───────────────────┘
df.filter(pl.col("id")==0).with_columns(pl.col("value").rolling_mean(100,min_periods=1,closed='left'))

yields the same.

zundertj avatar Feb 04 '23 20:02 zundertj

Closing this given no response, and no one able to reproduce. If you run into this again, feel free to open a new issue.

zundertj avatar Mar 07 '23 20:03 zundertj