polars add ```apply``` func to groupby_rolling, groupby

Describe your feature request

currently the groupby_rolling, groupby_dynamic can only agg pl.col context, but sometimes we need to apply func over the whole dataframe for analysis, for sure, we can use pl.apply([columns], func(pl.DataFrame(columns)) to reconstruct that, i remember in very old version there is an api call the the df.rolling(windows).apply can this be add to current api

May 08 '22 12:05 yutiansut

expected:

df.groupby_rolling(.....).apply( 

df.groupby_dynamic(.....).apply(

maybe a __iter__ for groupbyclass?

May 08 '22 12:05 yutiansut

Do you mean the same functionality as the ordinary groupby has?

This is what it does:

    def apply(
        self,
        func: Callable[[Any], Any],
        return_dtype: Optional[Type[DataType]] = None,
    ) -> DF:
        """
        Apply a function over the groups.
        """
        df = self.agg_list()
        if self.selection is None:
            raise TypeError(
                "apply not available for Groupby.select_all(). Use select() instead."
            )
        for name in self.selection:
            s = df.drop_in_place(name + "_agg_list").apply(func, return_dtype)
            s.rename(name, in_place=True)
            df[name] = s

        return df

May 11 '22 11:05 ritchie46

+1

Yes, I believe this is what I'm after also.

I'm wanting to do the following steps:

Apply a groupby_rolling() to a data frame
then, for each group/window produced by the groupby_rolling() I want to groupby()/groupby_dynamic() that
so that I can apply various aggregation functions to those sub-groups and return as new columns for each row

To put that in more of a plain-english use case:

Suppose a data frame with a timeseries and float columns
For each row, I want to take a rolling window of the past 15min of data
For each rolling window, I want to sub-group that 15min into chunks of 5min
For each of those 5min chunks I want to calculate (for the float column): -- the mean of that chunk (and return as new columns (think col names like "5min_ago_mean", "10min_ago_mean" (or this could be one column containing a list etc; same-same)) -- the diff between the first and last value in each chunk (and return as new columns (e.g. "5_min_ago_diff"))

My mental model for this is that in order to do the sub-group aggregation I 'want' to treat each window from the groupby_rolling() like a dataframe (as opposed to single columns within a normal aggregation context) as I need to be able to sub-group based on the timestamps within the rolling window before I can apply any aggregations to the float column.

Please let me know if you want a more comprehensive example or if there's another approach/API method I should be looking at to achieve this.

Jan 02 '23 11:01 Dermotholmes

This exists as map_groups. There is also an __iter__ method for all group by types.

Sep 08 '23 15:09 stinodego

polars
polars copied to clipboard

add ```apply``` func to groupby_rolling, groupby_dynamic for a dataframe context

Describe your feature request

polars polars copied to clipboard

add ```apply``` func to groupby_rolling, groupby_dynamic for a dataframe context

Describe your feature request

polars
polars copied to clipboard