uncertainties icon indicating copy to clipboard operation
uncertainties copied to clipboard

numpy.nanmean() does not skip nan±… or …±nan

Open ricleal opened this issue 7 years ago • 6 comments

Hello!

First of all, great piece of work! It's saving me a lot of time :)

I'm having issues with numpy.nanmean that should ignore nan values when calculating the mean.

Here some test code:

from uncertainties import unumpy
import numpy as np
v = np.arange(16,dtype=np.float64)
e = np.sqrt(v)
v[1:3] = np.nan
print(v)
print(np.isnan(v[1:3]))
un = unumpy.uarray(v,e)
print(un)
print(un.mean())
print(np.nanmean(un))
print(v.mean())
print(np.nanmean(v))

Here the output:

[  0.  nan  nan   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.]
[ True  True]
[0.0+/-0 nan+/-1.0 nan+/-1.4142135623730951 3.0+/-1.7320508075688772
 4.0+/-2.0 5.0+/-2.23606797749979 6.0+/-2.449489742783178
 7.0+/-2.6457513110645907 8.0+/-2.8284271247461903 9.0+/-3.0
 10.0+/-3.1622776601683795 11.0+/-3.3166247903554 12.0+/-3.4641016151377544
 13.0+/-3.605551275463989 14.0+/-3.7416573867739413
 15.0+/-3.872983346207417]
nan+/-0.6846531968814576
nan+/-0.6846531968814576
nan
8.35714285714

From the output, you can see that both mean and nanmean are returning nan+/-error. I'd say that the later should return the mean ignoring the nan values.

I hope you can help with that! Thanks

ricleal avatar Sep 07 '16 15:09 ricleal

Thanks.

Strictly speaking, this is the expected behavior: nan±… is not nan, and NumPy skips nan (only).

Now, unumpy.isnan() works as you want and could be used as a mask, or for boolean indexing.

I will check whether there is any way to make NumPy understand that nan±… should be treated like nan by nanmean().

lebigot avatar Sep 08 '16 08:09 lebigot

Wouldn't it be preferable to make ufloat(np.nan, 2) return a np.nan directly? As nan+/-2.0 doesn't really make sense anyway (same as 2.0+/-nan)?

rth avatar Sep 08 '16 09:09 rth

The general idea of never producing nan±… but producing nan instead seems reasonable, since we have basically no information on the number (with uncertainty) in question. Implementing this goes beyond changing the creation of nan±… with ufloat(), as they are many other ways of creating a number with uncertainty. I guess that this is quite doable, though. So, something to be implemented, probably.

±inf±… seems like it could be handled in a similar way.

Now, I would have to think about 2±nan a bit more: the nominal value is still relevant (it is the same as in a calculation with uncertainty), and the nan just shows that calculating the uncertainty with linear error propagation theory does not give a good result. The mean of numbers that include this one could thus have a relevant nominal value (with an uncertainty of nan that indicates that the uncertainty is not to be trusted, which is an important piece of information, that does not invalidate the relevance of the nominal value).

lebigot avatar Sep 08 '16 20:09 lebigot

First, athanks a lot for this extremely useful module!

I have just been playing around with this, and discovered that if I convert all occurrences of nan+/-nan to simply be NaN, and then run np.nanmean(), I get values of nan+/-23.4 etc.

So apparently, there is no way to do a nanmean with uncertainties...?

thriveth avatar Jan 11 '18 20:01 thriveth

Thanks!

It is actually possible to a NaN-mean even when you are using uncertainties. With

>>> import uncertainties as unc
>>> from uncertainties import unumpy
>>> import numpy as np

>>> nan = float("nan")
>>> arr = np.array([nan, unc.ufloat(nan, 1), unc.ufloat(1, nan), 2])
>>> arr
array([nan, nan+/-1.0, 1.0+/-nan, 2], dtype=object)

you can get the NaN-mean by selecting only the values with a non-NaN nominal value:

>>> arr[~unumpy.isnan(arr)].mean()
1.5+/-nan

or more directly by asking NumPy to skip them:

>>> np.ma.array(arr, mask=unumpy.isnan(arr))
masked_array(data=[--, --, 1.0+/-nan, 2],
             mask=[ True,  True, False, False],
       fill_value='?',
            dtype=object)
>>> _.mean()
1.5+/-nan

In this case the uncertainty is NaN as it should be, because one of the numbers does have an undefined uncertainty, which makes the final uncertainty undefined (but not the average). In general, uncertainties are not NaN and you obtain the mean of the non-NaN values.

(Edited so as to reflect the fact that the uncertainties module already provides uncertainties.umath.isnan() and uncertainties.unumpy.isnan().

lebigot avatar Jan 13 '18 10:01 lebigot

PS: I added all the information (and more) from my post above to the documentation: http://uncertainties-python-package.readthedocs.io/en/latest/genindex.html#N. Thank you for your feedback!

lebigot avatar Jan 13 '18 17:01 lebigot