Ignore np.nan values in Normalize.autoscale()
PR summary
When using colormaps to map scalars to colors, it is necessary to normalize the values into the range [0,1] beforehand. The Normalize class offers this functionality and includes the method autoscale to determine a sensible value vmin and vmax for scaling automatically. However, in its current state, autoscale will set vmin and vmax to nan when presented with the input np.array([1, 2, np.nan]). Because of this, calling an Normalize instance on this will transform the entire array to nan.
Example pre-PR:
>>> x = np.array([1, 2, 3, np.nan])
>>> n = Normalize()
>>> n.autoscale(x)
>>> n(x)
masked_array(data=[nan, nan, nan, nan],
mask=False,
fill_value=1e+20)
Example post-PR:
>>> x = np.array([1, 2, 3, np.nan])
>>> n = Normalize()
>>> n.autoscale(x)
>>> n(x)
masked_array(data=[ 0., 0.5, 1., nan],
mask=False,
fill_value=1e+20)
This PR closes #28405.
Additonal comment: the values np.inf and -np.inf currently also break the autoscale function. It is not obvious how the autoscaler should behave when presented with such values. I'd be open to modify this PR in such a way that they also get ignored.
PR checklist
- [x] "closes #0000" is in the body of the PR description to link the related issue
- [x] new and changed code is tested
- [ N/A] Plotting related features are demonstrated in an example
- [ N/A] New Features and API Changes are noted with a directive and release note
- [ N/A] Documentation complies with general and docstring guidelines
I agree this is a bit of an annoyance, but I also wonder if this would be a bit of a whack-a-mole to try and fix everywhere. For example, there are a few other autoscale_None calls on other norms.
I wonder if it would be easier to use A = cbook.safe_masked_invalid(A) above this so we are entering without any of the invalid possibilities.
FYI: It looks like there have been some micro-optimizations in this area, so maybe people do not want the extra speed penalty to support this? https://github.com/matplotlib/matplotlib/pull/26335
I believe we need a clear understanding and documentation, what A can be.
Typically in colormapping scenarios, A is the array ScalarMappable._A, and set_array() already runs through cbook.safe_masked_invalid(A). So if I'm not mistaken, at the bare minimum it should be sufficient to document that A is not expected to have invalid values. One can only be trapped here if one passes values manually to autoscale_None().
It's t.b.d. whether we want to make that (relatively rare) case be more comfortable as well. And that decision includes a look at the performance tradeof. If it's negligible, we can still do it.