matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

Ignore np.nan values in Normalize.autoscale()

Open epistoteles opened this issue 1 year ago • 2 comments

PR summary

When using colormaps to map scalars to colors, it is necessary to normalize the values into the range [0,1] beforehand. The Normalize class offers this functionality and includes the method autoscale to determine a sensible value vmin and vmax for scaling automatically. However, in its current state, autoscale will set vmin and vmax to nan when presented with the input np.array([1, 2, np.nan]). Because of this, calling an Normalize instance on this will transform the entire array to nan.

Example pre-PR:

>>> x = np.array([1, 2, 3, np.nan])
>>> n = Normalize()
>>> n.autoscale(x)
>>> n(x)
masked_array(data=[nan, nan, nan, nan],
             mask=False,
       fill_value=1e+20)

Example post-PR:

>>> x = np.array([1, 2, 3, np.nan])
>>> n = Normalize()
>>> n.autoscale(x)
>>> n(x)
masked_array(data=[ 0.,  0.5, 1., nan],
             mask=False,
       fill_value=1e+20)

This PR closes #28405.

Additonal comment: the values np.inf and -np.inf currently also break the autoscale function. It is not obvious how the autoscaler should behave when presented with such values. I'd be open to modify this PR in such a way that they also get ignored.

PR checklist

epistoteles avatar Jun 16 '24 17:06 epistoteles

I agree this is a bit of an annoyance, but I also wonder if this would be a bit of a whack-a-mole to try and fix everywhere. For example, there are a few other autoscale_None calls on other norms.

I wonder if it would be easier to use A = cbook.safe_masked_invalid(A) above this so we are entering without any of the invalid possibilities.

FYI: It looks like there have been some micro-optimizations in this area, so maybe people do not want the extra speed penalty to support this? https://github.com/matplotlib/matplotlib/pull/26335

greglucas avatar Jun 23 '24 14:06 greglucas

I believe we need a clear understanding and documentation, what A can be.

Typically in colormapping scenarios, A is the array ScalarMappable._A, and set_array() already runs through cbook.safe_masked_invalid(A). So if I'm not mistaken, at the bare minimum it should be sufficient to document that A is not expected to have invalid values. One can only be trapped here if one passes values manually to autoscale_None().

It's t.b.d. whether we want to make that (relatively rare) case be more comfortable as well. And that decision includes a look at the performance tradeof. If it's negligible, we can still do it.

timhoffm avatar Oct 07 '24 13:10 timhoffm