polars
polars copied to clipboard
GROUPING SETS feature in Polars
Problem description
Recently, I needed to filter data to look at rolled up aggregates at different levels in Python.
I was quite fond of data.table
s implementation of it in R, and I searched for something similar in Polars, but it wasn't there. Thankfully, DuckDB's Polars integration came to the rescue here, but I would love to see it implemented in Polars if it isn't too much effort.
Handy reference for GROUPINGSETS, ROLLUP and CUBE: PostgresSQL, Oracle and SQL server.
As a distinct feature which would be difficult in plain SQL, the issue of using null
as the column value for higher level rollups can be replaced with an argument on how to represent it (such as "Total"
).
Similar to #6645 but not as a feature request. and not as a year/month/day rollup. The solution there works fine for fewer columns, but not for larger combinations:
groupingsets=[[pl.col('date').dt.year().alias("year"), pl.col('date').dt.month().alias("month")],
[pl.col('date').dt.year().alias("year")],
[pl.lit(None).cast(pl.Int32()).alias('year')]]
pl.concat(
[exdf.groupby(x).agg(pl.col('x').mean()) for x in groupingsets], how='diagonal'
).sort(['year','month'])