enh: allow for `.over()`
We should allow
nw.col('a').sum().over()
e.g. for data {'a': [1,2,3], 'b': [4,5,6]}, df.select(nw.col('a').sum().over(), 'b') should produce {'a': [6, 6, 6], 'b': [4, 5, 6]}
This is not supported in polars either 🤔
import polars as pl
data = {"a": [5, 4, 3, 2, 1]}
pl.DataFrame(data).with_columns(a_max=pl.col("a").max().over())
TypeError: Expr.over() missing 1 required positional argument: 'partition_by'
Unrelated, I just noticed that nw.Expr.over signature is quite different from polars.Expr.over
thanks - yeah we should align them
and I think we should allow this in Polars too, so that people can write
In [9]: df = pl.DataFrame({'a': [1,1,2], 'b': [4,5,6], 'c': [2, 1, 3]})
In [10]: df.with_columns(d=pl.col('a').cum_sum().over(order_by='c'))
Out[10]:
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1 ┆ 4 ┆ 2 ┆ 2 │
│ 1 ┆ 5 ┆ 1 ┆ 1 │
│ 2 ┆ 6 ┆ 3 ┆ 4 │
└─────┴─────┴─────┴─────┘
(currently, it requires df.with_columns(d=pl.col('a').cum_sum().over(pl.lit(1), order_by='c')))
+1 for aligning the signature of Expr.over to recent polars. I was trying to write a basic window function (e.g. count("a") over (partition by "a" order by "b" asc) and couldn't get a proper order by to work with the existing method
@MarcoGorelli this is currently possible with order dependent ops, but not with general aggregation.
I am not sure which is the sweet spot you want to reach 👀
this was rejected in Polars, you have to specify at least one of either order_by or partition_by