astropy icon indicating copy to clipboard operation
astropy copied to clipboard

Better explain masked arrays in sigma_clip

Open astrofrog opened this issue 9 years ago • 10 comments

For many users, sigma_clip may be where they encounter masked arrays for the first time. It would be nice to improve the docstring for sigma_clip:

http://docs.astropy.org/en/stable/api/astropy.stats.sigma_clip.html#astropy.stats.sigma_clip

to include a few lines on what a masked array is (including a printed view), just before we mention that Numpy ufuncs often understand them. Alternatively, we could link to http://docs.scipy.org/doc/numpy/reference/maskedarray.html but that is a bit extensive.

astrofrog avatar Jun 16 '15 09:06 astrofrog

:+1: We have some internal users who been repeatedly confused about proper use of masked arrays.

embray avatar Jun 16 '15 12:06 embray

Also I think linking to the Numpy docs makes sense to do no matter what, but a more straightforward "hands on" introduction might be ncie too.

embray avatar Jun 16 '15 12:06 embray

Here's a short introduction to masked arrays which could go in the docstring, the astropy.stats introduction, or both.


The function returns a masked array, a type of Numpy array used for handling missing or invalid entries. Masked arrays retain the original data but also store another Boolean array of the same shape where True indicates that the value is masked. Most Numpy ufuncs will understand masked arrays and treat them appropriately. For example, consider the following dataset with a clear outlier:

>>> import numpy as np
>>> from astropy.stats import sigma_clip
>>> x = np.array([1, 0, 0, 1, 99, 0, 0, 1, 0])

The mean is skewed by the outlier:

>>> x.mean()
11.333333333333334

Sigma-clipping (3 sigma by default) returns a masked array, and so functions like mean will ignore the outlier:

>>> clipped = sigma_clip(x)
>>> clipped
masked_array(x = [1 0 0 1 -- 0 0 1 0],
             mask = [False False False False  True False False False False],
       fill_value = 999999)
>>> clipped.mean()
0.375

If you need to access the original data directly, you can use the .data property. Combined with the .mask property, you can get the original outliers, or the values that were not clipped:

>>> outliers = clipped.data[clipped.mask]
>>> outliers
array([99])
>>> valid = clipped.data[~clipped.mask]
>>> valid
array([1, 0, 0, 1, 0, 0, 1, 0])

For more information on masked arrays, including see the Numpy documentation.


Would it be worthwhile including a mention of how NaNs interact with this function? I think a key advantage over the scipy version is that it handles NaN values and masks them automatically.

swt30 avatar Aug 30 '15 13:08 swt30

@astrofrog If this issue is still open, can I work on it?

shwetamore1295 avatar Jan 26 '16 06:01 shwetamore1295

@shwetamore1295 - yes!

taldcroft avatar Jan 26 '16 19:01 taldcroft

@taldcroft @embray @astrofrog Please review my commit .I have explained the masked array in sigma_clip.py

AMAN3003 avatar Feb 15 '16 07:02 AMAN3003

I think the intro to masked arrays that @swt30 wrote should go in the user guide somewhere, in some form. The docs specifically for sigma_clip should just link to that, as should other functions in Astropy that employ masked arrays. As there are several such functions I don't know that the astropy.stats docs is necessarily the best place for it (however it could still use examples from astropy.stats.

embray avatar Feb 17 '16 16:02 embray

@AMAN3003 Can you do a pull request for your commit and then we should be able to comment on it. However, as @embray says, it might be better to add it into the generally documentation or a separate page within the documentation dealing with how masked arrays are used.

crawfordsm avatar Mar 12 '16 14:03 crawfordsm

within sigma clip is there a straightforward way to mask nd arrays? i.e. without loosing the dimensions of the array how could one apply sigma clip to mask outliers in a 2d or 3d array and extract an output array with similar shape to the input buyt with outliers masked?

themiyan avatar Jul 02 '18 17:07 themiyan

Hello! Happy to work on this but I'm new to this project (and open source😅). Could you point me to where you want the explanation exactly?

hwiks avatar Aug 18 '22 14:08 hwiks

@astrofrog @crawfordsm @taldcroft @embray I have added the PR for this issue plz have a look.

Telomelonia avatar Dec 23 '22 17:12 Telomelonia