bottleneck
bottleneck copied to clipboard
[QUESTION] Plans for an equivalent to pandas groupby?
I just started using this library, love it.
Quick question - are there any plans for an equivalent to pandas groupby?
Something like: bn.group_by(matrix[:, :2]) .reduce(matrix[:, -1], np.sum)
To be honest, I hadn't considered it. Are you looking to avoid a pandas dependency or see this as a way to get more performance?
The latter, to get more performance. I believe pandas groupby has been optimized (not sure if via Cython) but a bottleneck C function would provide substantial speed gains.
Okay, thanks for clarifying. I'll keep this open in case someone would like to try out PRs in this vein, but probably won't take a more serious look at this myself until I clear out the backlog.
FYI for anyone looking for these — numbagg has groupby functions. It makes a good complement to bottleneck...