flox icon indicating copy to clipboard operation
flox copied to clipboard

Improvement suggestion for sum_of_squares

Open ml31415 opened this issue 3 years ago • 1 comments

Hi Deepak,

thanks for using numpy_groupies in flox. Out of curiosity I was reading through the codebase and noted that line:

https://github.com/dcherian/flox/blob/227ce041e9b78f8432c62370c9b2c46dd91fa657/flox/aggregate_npg.py#L17

It creates an intermediate array for the squares, which then gets summed up. In this context, this is likely to be a relevant time waster for allocating that intermediate array for the squares, while it could all be done in one go with simply a derived class based on

https://github.com/ml31415/numpy-groupies/blob/c11987005cccea1b3e990aba02ece482f3a00b1d/numpy_groupies/aggregate_numba.py#L240

class SumOfSquares(Sum):
    @staticmethod
    def _inner(ri, val, ret, counter, mean):
        counter[ri] = 0
        ret[ri] += val * val

Should do the trick and should be free of extra array and memory allocation and run in roughly the same time as the plain grouped sum.

ml31415 avatar May 18 '22 12:05 ml31415

Thanks Michael. That looks great.

I definitely need to do a performance pass. I've spent a lot of time trying to match xarray's current behaviour, and work more efficiently with dask, so I mostly skipped the performance aspect with numpy arrays.

dcherian avatar May 18 '22 15:05 dcherian