polars
polars copied to clipboard
groupby for list and struct type columns
Thank you for this absolutely wonderful library!
I'm afraid I hit a snag. What I tried to do was to group by a nested data type, as in:
df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df.groupby("b").agg(pl.sum("a"))
This results in a not implemented
panic.
I'm curious as to whether this is simply not implemented yet or whether this would contradict the underlying philosophy of polars.
Best regards Peter
We do not support grouping by a column of type list. I think we should improve the error message on that.
Thank you very much for the quick answer!
We do not support grouping by a column of type list. I think we should improve the error message on that.
@ritchie46 what do you think should be the error that comes out? DataTypeMisMatch
?
I think a ComputeError
would be most consistent.
For structs we could temporarily unnest -> do the groupby -> and nest again.
Just in case anybody else stumbles upon this, the workaround I am now using is to convert to "str". Not ideal, but does the trick.
df = pl.DataFrame({"a": [1, 2, 3], "b": [[1, 3, 4], [2, 4, 6], [17]]})
df = df.with_column(pl.col("b").arr.eval(pl.element().cast(pl.Utf8)).arr.join("|"))
df.groupby("b").agg(pl.sum("a"))