pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

[ENH] Method for adding functionality to GroupBy

Open zbarry opened this issue 6 years ago • 3 comments

It would be nice to be able to add functionality to the Pandas GroupBy objects: GroupBy, DataFrameGroupBy, SeriesGroupBy. There's no convenient accessor interface to do this, but maybe there's a way to reliably monkeypatch them. This would allow us to create nifty aggregation / apply functions and avoid the .groupby(...).apply() route for tasks we may encounter routinely. It could also potentially open up opportunities to speed up such operations... .groupby().apply() can often be slow for large numbers of groups.

zbarry avatar Oct 12 '19 15:10 zbarry

@Zsailer - what do you think about such a capability in PF?

zbarry avatar Oct 18 '19 19:10 zbarry

@zbarry @ericmjl @pyjanitor-devs/core-devs how can we make this possible? is this even possible?

samukweku avatar Mar 17 '22 23:03 samukweku

one way about this is with a summarise function, that has a by parameter, and within that function we can do all the magic within it. inspired by the update to the summarise feature coming in dplyr 1.1, and rdatatable and pydatatable use of by.

crude API example

df.summarise(col_name = func or arg name, by = func or kwargs)

We can even make it such that you can filter within a groupby effectively (maybe?)

samukweku avatar Nov 29 '22 11:11 samukweku