polars icon indicating copy to clipboard operation
polars copied to clipboard

gather_every in list.eval raises InvalidOperationError

Open MarcoGorelli opened this issue 1 year ago • 2 comments

Checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

df = pl.DataFrame({'a': [[1,2,3], [4,5,6]]})
df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(2)))

Log output

Traceback (most recent call last):
  File "/home/marcogorelli/tmp/t.py", line 4, in <module>
    df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(2)))
  File "/home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py", line 8235, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
  File "/home/marcogorelli/tmp/.venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1749, in collect
    return wrap_df(ldf.collect())
polars.exceptions.InvalidOperationError: output length of `map` (6) must be equal to the input length (3); consider using `apply` instead

Error originated in expression: 'col("").gather_every()'

Issue description

The above throws, but I think the output is well-defined

Expected behavior

shape: (2, 2)
┌───────────┬───────────┐
│ a         ┆ result    │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [1, 3]    │
│ [4, 5, 6] ┆ [4, 6]    │
└───────────┴───────────┘

Installed versions

--------Version info---------
Polars:               0.20.3
Index type:           UInt32
Platform:             Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            0.9.0
fsspec:               2023.6.0
gevent:               <not installed>
hvplot:               0.9.1
matplotlib:           3.7.1
numpy:                1.26.2
openpyxl:             <not installed>
pandas:               2.3.0.dev0+14.g62dbbe6713
pyarrow:              14.0.1
pydantic:             2.0.2
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.20
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

MarcoGorelli avatar Jan 03 '24 11:01 MarcoGorelli

@MarcoGorelli in your example code you put gather(0) instead of gather_every(2) what you probably meant? ;)

It seems to be related to the expted output size. It actually works for gather_every(1)

df.with_columns(result=pl.col('a').list.eval(pl.element().gather_every(1)))
shape: (2, 2)
┌───────────┬───────────┐
│ a         ┆ result    │
│ ---       ┆ ---       │
│ list[i64] ┆ list[i64] │
╞═══════════╪═══════════╡
│ [1, 2, 3] ┆ [1, 2, 3] │
│ [4, 5, 6] ┆ [4, 5, 6] │
└───────────┴───────────┘

Julian-J-S avatar Jan 03 '24 12:01 Julian-J-S

what you probably meant? ;)

thanks for spotting that, have updated!

MarcoGorelli avatar Jan 03 '24 12:01 MarcoGorelli

Thanks for reporting this! I believe the problem comes from gather_every itself rather than list.eval.

reswqa avatar Jan 18 '24 13:01 reswqa