polars icon indicating copy to clipboard operation
polars copied to clipboard

GROUPING SETS feature in Polars

Open avimallu opened this issue 1 year ago • 8 comments

Problem description

Recently, I needed to filter data to look at rolled up aggregates at different levels in Python.

I was quite fond of data.tables implementation of it in R, and I searched for something similar in Polars, but it wasn't there. Thankfully, DuckDB's Polars integration came to the rescue here, but I would love to see it implemented in Polars if it isn't too much effort.

Handy reference for GROUPINGSETS, ROLLUP and CUBE: PostgresSQL, Oracle and SQL server.

As a distinct feature which would be difficult in plain SQL, the issue of using null as the column value for higher level rollups can be replaced with an argument on how to represent it (such as "Total").

Similar to #6645 but not as a feature request. and not as a year/month/day rollup. The solution there works fine for fewer columns, but not for larger combinations:

groupingsets=[[pl.col('date').dt.year().alias("year"), pl.col('date').dt.month().alias("month")],
              [pl.col('date').dt.year().alias("year")],
              [pl.lit(None).cast(pl.Int32()).alias('year')]] 
pl.concat(
    [exdf.groupby(x).agg(pl.col('x').mean()) for x in groupingsets], how='diagonal'
    ).sort(['year','month'])

avimallu avatar Apr 03 '23 03:04 avimallu