polars icon indicating copy to clipboard operation
polars copied to clipboard

concat_list raises an error or returns an empty list if one of the filtered cols inside is empty

Open avlonder opened this issue 11 months ago • 3 comments

Checks

  • [X] I have checked that this issue has not already been reported.
  • [X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.LazyFrame({'a': [1, 2], 'b': [3, 4], 'c': [0, 0]})

df.group_by('c').agg([
  pl.concat_list(
      pl.col('a').filter(pl.col('a').eq(1)),
      pl.col('b').filter(pl.col('b').ge(0))
  ).flatten()
])

works as expected: shape: (1, 2) ┌─────┬─────────────┐ │ c ┆ a │ │ --- ┆ --- │ │ i64 ┆ list[i64] │ ╞═════╪═════════════╡ │ 0 ┆ [1, 3, … 4] │ └─────┴─────────────┘

However, if either the first filter or the second filter evaluates to nothing, the behavior is unexpected:

df.group_by('c').agg([
                pl.concat_list(
                    pl.col('a').filter(pl.col('a').eq(5)),
                    pl.col('b').filter(pl.col('b').ge(0))
                ).flatten()
            ])

raises polars.exceptions.ShapeError: series length 2 does not match expected length of 0

and if you add an extra first() operation, no error is thrown and it silently returns a wrong empty list

df.group_by('c').agg([
                pl.concat_list(
                    pl.col('a').filter(pl.col('a').eq(5)),
                    pl.col('b').filter(pl.col('b').ge(0)).first()
                ).flatten()
            ])

shape: (1, 2) ┌─────┬───────────┐ │ c ┆ a │ │ --- ┆ --- │ │ i64 ┆ list[i64] │ ╞═════╪═══════════╡ │ 0 ┆ [] │ └─────┴───────────┘

Log output

No response

Issue description

...

Expected behavior

...

Installed versions

--------Version info---------
Polars:               0.20.14
Index type:           UInt32
Platform:             Linux-6.1.0-1035-oem-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
numpy:                1.26.4
openpyxl:             <not installed>

avlonder avatar Mar 21 '24 13:03 avlonder

The working version also appears to be broken.

The 1 from a ends up in b

df.group_by('c').agg(
    pl.col('a').filter(pl.col('a').eq(1)),
    pl.col('b').filter(pl.col('b').ge(0))
)
# shape: (1, 3)
# ┌─────┬───────────┬───────────┐
# │ c   ┆ a         ┆ b         │
# │ --- ┆ ---       ┆ ---       │
# │ i64 ┆ list[i64] ┆ list[i64] │
# ╞═════╪═══════════╪═══════════╡
# │ 0   ┆ [1]       ┆ [3, 4]    │
# └─────┴───────────┴───────────┘
df.group_by('c').agg(
    pl.concat_list(
        pl.col('a').filter(pl.col('a').eq(1)),
        pl.col('b').filter(pl.col('b').ge(0))
    )
)
# shape: (1, 2)
# ┌─────┬──────────────────┐
# │ c   ┆ a                │
# │ --- ┆ ---              │
# │ i64 ┆ list[list[i64]]  │
# ╞═════╪══════════════════╡
# │ 0   ┆ [[1, 3], [1, 4]] │ # <- ???
# └─────┴──────────────────┘

.append() may be a possible workaround.

df.group_by('c').agg(
    pl.col('a').filter(pl.col('a').eq(1)).append(
        pl.col('b').filter(pl.col('b').ge(0))
    )
)

# shape: (1, 2)
# ┌─────┬───────────┐
# │ c   ┆ a         │
# │ --- ┆ ---       │
# │ i64 ┆ list[i64] │
# ╞═════╪═══════════╡
# │ 0   ┆ [1, 3, 4] │
# └─────┴───────────┘

cmdlineluser avatar Mar 21 '24 14:03 cmdlineluser

Thanks @cmdlineluser, append works for my use case.

avlonder avatar Mar 21 '24 14:03 avlonder

IDK if this is the same issue, but in Ibis we need to construct a polars list based on a python list literal of length 0 to N. For length >=1 we can use concat_list, but for length 0 we have to use pl.lit. It would be great it there was one API that could do both. Could we add a keyword-only type argument to concat_list()? I can move this to a separate issue if you want. Thanks!

NickCrews avatar May 12 '24 15:05 NickCrews