bastionlab icon indicating copy to clipboard operation
bastionlab copied to clipboard

Panic: polars, sort on a list

Open cchudant opened this issue 3 years ago • 3 comments

Repro:

1)
import polars as pl
from bastionlab.polars.policy import Policy, TrueRule, Log

repro_df = pl.DataFrame({ "hello": [1, 2, 3], "world": [1, 2, 3] })
repro_rdf = connection.client.polars.send_df(repro_df, policy=Policy(safe_zone=TrueRule(), unsafe_handling=Log(), savable=False))

(
    repro_rdf
    .groupby(pl.col("hello"))
    .agg(pl.col("world")) # replace with .agg(pl.col("world")).sum() for it to work
    .sort(pl.col("world"))
).collect().fetch()

results in

thread 'tokio-runtime-worker' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/cchudant/.cargo/registry/src/github.com-1ecc6299db9ec823/polars-core-0.25.1/src/series/series_trait.rs:427:9

This is inside polars, not sure how to proceed here

cchudant avatar Jan 12 '23 13:01 cchudant

I believe the operation you're trying to perform does no really make sense: the aggregation step does not involve an aggregation function which means you get the list of values in the group for the "world" column, then you try to sort the same column which fails as polars doesn't know how to sort lists.

>>> import polars as pl
>>> df = pl.DataFrame({ "hello": [1,2,3], "world": [1,2,3]})
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).collect()
shape: (3, 2)
┌───────┬───────────┐
│ hello ┆ world     │
│ ---   ┆ ---       │
│ i64   ┆ list[i64] │
╞═══════╪═══════════╡
│ 2     ┆ [2]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 3     ┆ [3]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ 1     ┆ [1]       │
└───────┴───────────┘
>>> df.lazy().groupby(pl.col("hello")).agg(pl.col("world")).sort(pl.col("world")).collect()
thread '<unnamed>' panicked at 'this operation is not implemented/valid for this dtype: List(Int64)', /home/runner/work/polars/polars/polars/polars-core/src/series/series_trait.rs:427:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dhalf/Documents/bastionlab/env/lib/python3.10/site-packages/polars/internals/lazyframe/frame.py", line 920, in collect
    return pli.wrap_df(ldf.collect())
pyo3_runtime.PanicException: this operation is not implemented/valid for this dtype: List(Int64)

dhalf avatar Jan 13 '23 10:01 dhalf

That said, we can probably improve error reporting.

dhalf avatar Jan 13 '23 10:01 dhalf

Should i close this issue / reopen this in polars instead then?

cchudant avatar Jan 13 '23 14:01 cchudant