polars
polars copied to clipboard
gather_every in list.eval raises InvalidOperationError
Checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
df = pl.DataFrame({'a': [[1,2,3], [4,5,6]]})
df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(2)))
Log output
Traceback (most recent call last):
File "/home/marcogorelli/tmp/t.py", line 4, in <module>
df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(2)))
File "/home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py", line 8235, in with_columns
return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
File "/home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1749, in collect
return wrap_df(ldf.collect())
polars.exceptions.InvalidOperationError: output length of `map` (6) must be equal to the input length (3); consider using `apply` instead
Error originated in expression: 'col("").gather_every()'
Issue description
The above throws, but I think the output is well-defined
Expected behavior
shape: (2, 2)
┌───────────┬───────────┐
│ a ┆ result │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [1, 3] │
│ [4, 5, 6] ┆ [4, 6] │
└───────────┴───────────┘
Installed versions
--------Version info---------
Polars: 0.20.3
Index type: UInt32
Platform: Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: 2.2.1
connectorx: <not installed>
deltalake: 0.9.0
fsspec: 2023.6.0
gevent: <not installed>
hvplot: 0.9.1
matplotlib: 3.7.1
numpy: 1.26.2
openpyxl: <not installed>
pandas: 2.3.0.dev0+14.g62dbbe6713
pyarrow: 14.0.1
pydantic: 2.0.2
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: 2.0.20
xlsx2csv: <not installed>
xlsxwriter: <not installed>
@MarcoGorelli in your example code you put gather(0)
instead of gather_every(2)
what you probably meant? ;)
It seems to be related to the expted output size.
It actually works for gather_every(1)
df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(1)))
shape: (2, 2)
┌───────────┬───────────┐
│ a ┆ result │
│ --- ┆ --- │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [1, 2, 3] │
│ [4, 5, 6] ┆ [4, 5, 6] │
└───────────┴───────────┘
what you probably meant? ;)
thanks for spotting that, have updated!
Thanks for reporting this! I believe the problem comes from gather_every
itself rather than list.eval
.