einops icon indicating copy to clipboard operation
einops copied to clipboard

How to ignore π™½πšŠπ™½ in reduce?

Open randolf-scholz opened this issue 3 years ago β€’ 1 comments

Numpy and many other libraries have introduced additional aggregation functions that ignore π™½πšŠπ™½-values, for instance:

  • numpy.nan[sum, mean, min, max, argmin, argmax, median, std, var, prod, quantile, percentile]
  • torch.nan[sum, mean, median, quantile]
  • tensorflow.experimental.numpy
  • jax.numpy
  1. Use-cases This would be mostly a comfort increase. Avoiding aggregation over π™½πšŠπ™½-values when working with data that has missing values, or when padding (padding with π™½πšŠπ™½'s instead of 0's has the advantage that any computation that accidentally uses the padding values will result in a π™½πšŠπ™½ again - thus making it easier to notice such bugs.)
  2. Implementation. Either, avoid iterating over π™½πšŠπ™½-values altogether, or chose a masking value appropriate for the chosen reduction, e.g.
    • nansum β†’ replace π™½πšŠπ™½ with 0
    • nanprod β†’ replace π™½πšŠπ™½ with 1
    • nanmax β†’ replace π™½πšŠπ™½ with -π™Έπš—πš
  3. Integrity - does it interplay well with existing operations and notation in einops? It is a simple additional boolean flag ignore_nan for reduce
  4. Readability. Alternatively, one could have a nanreduce that does the same thing but is visually more striking.

Similarly, one could consider an additional ignore_infinite-flag.

randolf-scholz avatar Jan 17 '22 09:01 randolf-scholz

It is currently supported by providing callables for reductions in einops.reduce. Example:

einops.reduce(array, 'i j k -> (i j)', np.nanmean)

arogozhnikov avatar Jan 18 '22 07:01 arogozhnikov