polars
polars copied to clipboard
add ```apply``` func to groupby_rolling, groupby_dynamic for a dataframe context
Describe your feature request
currently the groupby_rolling, groupby_dynamic can only agg pl.col context, but sometimes we need to apply func over the whole dataframe for analysis, for sure, we can use pl.apply([columns], func(pl.DataFrame(columns)) to reconstruct that, i remember in very old version there is an api call the the df.rolling(windows).apply can this be add to current api
expected:
df.groupby_rolling(.....).apply(
df.groupby_dynamic(.....).apply(
maybe a __iter__
for groupbyclass?
Do you mean the same functionality as the ordinary groupby has?
This is what it does:
def apply(
self,
func: Callable[[Any], Any],
return_dtype: Optional[Type[DataType]] = None,
) -> DF:
"""
Apply a function over the groups.
"""
df = self.agg_list()
if self.selection is None:
raise TypeError(
"apply not available for Groupby.select_all(). Use select() instead."
)
for name in self.selection:
s = df.drop_in_place(name + "_agg_list").apply(func, return_dtype)
s.rename(name, in_place=True)
df[name] = s
return df
+1
Yes, I believe this is what I'm after also.
I'm wanting to do the following steps:
- Apply a
groupby_rolling()
to a data frame - then, for each group/window produced by the
groupby_rolling()
I want togroupby()
/groupby_dynamic()
that - so that I can apply various aggregation functions to those sub-groups and return as new columns for each row
To put that in more of a plain-english use case:
- Suppose a data frame with a timeseries and float columns
- For each row, I want to take a rolling window of the past 15min of data
- For each rolling window, I want to sub-group that 15min into chunks of 5min
- For each of those 5min chunks I want to calculate (for the float column): -- the mean of that chunk (and return as new columns (think col names like "5min_ago_mean", "10min_ago_mean" (or this could be one column containing a list etc; same-same)) -- the diff between the first and last value in each chunk (and return as new columns (e.g. "5_min_ago_diff"))
My mental model for this is that in order to do the sub-group aggregation I 'want' to treat each window from the groupby_rolling()
like a dataframe (as opposed to single columns within a normal aggregation context) as I need to be able to sub-group based on the timestamps within the rolling window before I can apply any aggregations to the float column.
Please let me know if you want a more comprehensive example or if there's another approach/API method I should be looking at to achieve this.
This exists as map_groups
. There is also an __iter__
method for all group by types.