rolling icon indicating copy to clipboard operation
rolling copied to clipboard

Handling of NaN

Open kmuehlbauer opened this issue 6 years ago • 2 comments

Short question, is it somehow possible to extend this to handle NaN, like numpy nanmedian?

kmuehlbauer avatar May 16 '18 08:05 kmuehlbauer

Hi @kmuehlbauer, that's a good idea, I'll have to give some thought about how this could be implemented for each rolling iterator without affecting complexity.

For now, it should be straightforward to do this for some of the functions, just by using a generator with an appropriate fill-value. For example, Sum, filling NaN with 0:

>>> import math
>>> array = [1, 2, math.nan, 7, math.nan, 3, 2]
>>> array_fill_nan = (0 if math.isnan(x) else x for x in array) # generator, fills NaN values
>>> list(rolling.Sum(array_fill_nan, 3))
[3, 9, 7, 10, 5]

This approach doesn't work for Median however, as the fill value required at each step is not necessarily constant. I'll see whether adding support for missing values is feasible here. FWIW I think pandas just consider the whole window to be NaN if it contains at least one NaN value.

If your window size is small, rolling.Apply(array, window_size, operation=np.nanmedian)) should still be quite fast.

ajcr avatar May 17 '18 19:05 ajcr

@ajcr Thanks for looking into this. I'll definitely try your suggestion using rolling.Apply(array, window_size, operation=np.nanmedian)).

kmuehlbauer avatar May 18 '18 07:05 kmuehlbauer