polars
polars copied to clipboard
concat_list raises an error or returns an empty list if one of the filtered cols inside is empty
Checks
- [X] I have checked that this issue has not already been reported.
- [X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import polars as pl
df = pl.LazyFrame({'a': [1, 2], 'b': [3, 4], 'c': [0, 0]})
df.group_by('c').agg([
pl.concat_list(
pl.col('a').filter(pl.col('a').eq(1)),
pl.col('b').filter(pl.col('b').ge(0))
).flatten()
])
works as expected: shape: (1, 2) ┌─────┬─────────────┐ │ c ┆ a │ │ --- ┆ --- │ │ i64 ┆ list[i64] │ ╞═════╪═════════════╡ │ 0 ┆ [1, 3, … 4] │ └─────┴─────────────┘
However, if either the first filter or the second filter evaluates to nothing, the behavior is unexpected:
df.group_by('c').agg([
pl.concat_list(
pl.col('a').filter(pl.col('a').eq(5)),
pl.col('b').filter(pl.col('b').ge(0))
).flatten()
])
raises polars.exceptions.ShapeError: series length 2 does not match expected length of 0
and if you add an extra first() operation, no error is thrown and it silently returns a wrong empty list
df.group_by('c').agg([
pl.concat_list(
pl.col('a').filter(pl.col('a').eq(5)),
pl.col('b').filter(pl.col('b').ge(0)).first()
).flatten()
])
shape: (1, 2) ┌─────┬───────────┐ │ c ┆ a │ │ --- ┆ --- │ │ i64 ┆ list[i64] │ ╞═════╪═══════════╡ │ 0 ┆ [] │ └─────┴───────────┘
Log output
No response
Issue description
...
Expected behavior
...
Installed versions
--------Version info---------
Polars: 0.20.14
Index type: UInt32
Platform: Linux-6.1.0-1035-oem-x86_64-with-glibc2.35
Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fastexcel: <not installed>
fsspec: <not installed>
gevent: <not installed>
hvplot: <not installed>
matplotlib: <not installed>
numpy: 1.26.4
openpyxl: <not installed>
The working version also appears to be broken.
The 1
from a
ends up in b
df.group_by('c').agg(
pl.col('a').filter(pl.col('a').eq(1)),
pl.col('b').filter(pl.col('b').ge(0))
)
# shape: (1, 3)
# ┌─────┬───────────┬───────────┐
# │ c ┆ a ┆ b │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ list[i64] ┆ list[i64] │
# ╞═════╪═══════════╪═══════════╡
# │ 0 ┆ [1] ┆ [3, 4] │
# └─────┴───────────┴───────────┘
df.group_by('c').agg(
pl.concat_list(
pl.col('a').filter(pl.col('a').eq(1)),
pl.col('b').filter(pl.col('b').ge(0))
)
)
# shape: (1, 2)
# ┌─────┬──────────────────┐
# │ c ┆ a │
# │ --- ┆ --- │
# │ i64 ┆ list[list[i64]] │
# ╞═════╪══════════════════╡
# │ 0 ┆ [[1, 3], [1, 4]] │ # <- ???
# └─────┴──────────────────┘
.append()
may be a possible workaround.
df.group_by('c').agg(
pl.col('a').filter(pl.col('a').eq(1)).append(
pl.col('b').filter(pl.col('b').ge(0))
)
)
# shape: (1, 2)
# ┌─────┬───────────┐
# │ c ┆ a │
# │ --- ┆ --- │
# │ i64 ┆ list[i64] │
# ╞═════╪═══════════╡
# │ 0 ┆ [1, 3, 4] │
# └─────┴───────────┘
Thanks @cmdlineluser, append works for my use case.
IDK if this is the same issue, but in Ibis we need to construct a polars list based on a python list literal of length 0 to N. For length >=1 we can use concat_list, but for length 0 we have to use pl.lit. It would be great it there was one API that could do both. Could we add a keyword-only type
argument to concat_list()? I can move this to a separate issue if you want. Thanks!