Improvement suggestion for sum_of_squares
Hi Deepak,
thanks for using numpy_groupies in flox. Out of curiosity I was reading through the codebase and noted that line:
https://github.com/dcherian/flox/blob/227ce041e9b78f8432c62370c9b2c46dd91fa657/flox/aggregate_npg.py#L17
It creates an intermediate array for the squares, which then gets summed up. In this context, this is likely to be a relevant time waster for allocating that intermediate array for the squares, while it could all be done in one go with simply a derived class based on
https://github.com/ml31415/numpy-groupies/blob/c11987005cccea1b3e990aba02ece482f3a00b1d/numpy_groupies/aggregate_numba.py#L240
class SumOfSquares(Sum):
@staticmethod
def _inner(ri, val, ret, counter, mean):
counter[ri] = 0
ret[ri] += val * val
Should do the trick and should be free of extra array and memory allocation and run in roughly the same time as the plain grouped sum.
Thanks Michael. That looks great.
I definitely need to do a performance pass. I've spent a lot of time trying to match xarray's current behaviour, and work more efficiently with dask, so I mostly skipped the performance aspect with numpy arrays.