How to ignore 𝙽𝚊𝙽 in reduce?

Open randolf-scholz opened this issue 3 years ago • 1 comments

Numpy and many other libraries have introduced additional aggregation functions that ignore 𝙽𝚊𝙽-values, for instance:

numpy.nan[sum, mean, min, max, argmin, argmax, median, std, var, prod, quantile, percentile]
torch.nan[sum, mean, median, quantile]
tensorflow.experimental.numpy
jax.numpy

Use-cases This would be mostly a comfort increase. Avoiding aggregation over 𝙽𝚊𝙽-values when working with data that has missing values, or when padding (padding with 𝙽𝚊𝙽's instead of 0's has the advantage that any computation that accidentally uses the padding values will result in a 𝙽𝚊𝙽 again - thus making it easier to notice such bugs.)
Implementation. Either, avoid iterating over 𝙽𝚊𝙽-values altogether, or chose a masking value appropriate for the chosen reduction, e.g.
- nansum → replace 𝙽𝚊𝙽 with 0
- nanprod → replace 𝙽𝚊𝙽 with 1
- nanmax → replace 𝙽𝚊𝙽 with -𝙸𝚗𝚏
Integrity - does it interplay well with existing operations and notation in einops? It is a simple additional boolean flag ignore_nan for reduce
Readability. Alternatively, one could have a nanreduce that does the same thing but is visually more striking.

Similarly, one could consider an additional ignore_infinite-flag.

Jan 17 '22 09:01 randolf-scholz

It is currently supported by providing callables for reductions in einops.reduce. Example:

einops.reduce(array, 'i j k -> (i j)', np.nanmean)

Jan 18 '22 07:01 arogozhnikov