Polars lit, scalar error with over clause
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
df = pl.DataFrame({"a": [1, 2,3],})
df.with_columns(pl.concat_list(pl.col("a"))
.over(pl.lit(""), mapping_strategy="join")
.list.eval(pl.element().count()).alias("b"))
Log output
No response
Issue description
I would like to build a count of elements over a placeholder column that does not exist on the dataframe. We could use the code in the first chunk of code below to do so as of version 1.4.1:
import polars as pl
df = pl.DataFrame({"a": [1, 2,3],})
df.with_columns(pl.concat_list(pl.col("a"))
.over(pl.lit(""), mapping_strategy="join")
.list.eval(pl.element().count()).alias("b"))
While in version 1.9.0, we get the following error, even using pl.lit("").first():
InvalidOperationError: Series b, length 1 doesn't match the DataFrame height of 3
If you want expression: col("a").list.concat().over([String()]).eval() to be broadcasted, ensure it is a scalar (for instance by adding '.first()').
Which does not fail, but provides the wrong output, with mepping_strategy = 'group_to_rows'
Expected behavior
The ouput should be:
shape: (3, 2)
┌─────┬───────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ list[u32] │
╞═════╪═══════════╡
│ 1 ┆ [3] │
│ 2 ┆ [3] │
│ 3 ┆ [3] │
└─────┴───────────┘
Installed versions
Replace this line with the output of pl.show_versions(). Leave the backticks in place.
In general we currently do not handle broadcasting of scalar lists correctly. We don't correctly distinguish between a scalar List expression and a Series expression. For example, this is correct, because a Series expression inside an over context should match the length of the group. Scalar expressions should broadcast to each element in the group.:
>>> df = pl.DataFrame({"x": [1, 2, 3], "g": [1, 1, 2]})
>>> df.select(pl.col.x.reverse().over("g"))
shape: (3, 1)
┌─────┐
│ x │
│ --- │
│ i64 │
╞═════╡
│ 2 │
│ 1 │
│ 3 │
└─────┘
>>> df.select(pl.col.x.first().over("g"))
shape: (3, 1)
┌─────┐
│ x │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 1 │
│ 3 │
└─────┘
However, this is incorrect:
>>> df.select(pl.col.x.implode().over("g"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/orlp/.localpython/lib/python3.11/site-packages/polars/dataframe/frame.py", line 9010, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/orlp/.localpython/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2050, in collect
return wrap_df(ldf.collect(callback))
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: the length of the window expression did not match that of the group
> group: (1)
> group length: 1
> output: 'shape: (1,)
Series: '' [list[i64]]
[
[1, 2]
]'
Error originated in expression: 'col("x").list().over([col("g")])'
This should just broadcast, as Expr.implode() is a scalar expression returning a list. This should result in:
┌───────────┐
│ x │
│ --------- │
│ list[i64] │
╞═══════════╡
│ [1, 2] │
│ [1, 2] │
│ [3] │
└───────────┘