Adds function to find and plot local extrema
Description Of Changes
This PR adds a function to find local maximum and minimum using the spicy.ndimage.maximum_filter and spicy.ndimage.minimum_filter and a separate function to easily plot extrema on a map. The plotting sets up some sensible defaults while allowing for use of kwargs to set matplotlib.pyplot.Text settings.
I haven't yet incorporated the PathEffects suggestion from #1234, and implemented in #1237, but can easily do so once we are sure about current implementation, if desired. This PR would supersede #1237.
The operationalizes example code that has traditionally lived in the MetPy Example Gallery.
Checklist
- [x] Closes #1234
- [x] Tests added
- [x] Fully documented
I think the test failure is caused by minimum having a 3 1/2 year old version of xarray, which we could consider updating...but see my incoming review first.
I put together a version of this that uses the ax.scattertext() method we monkeypatch (😬) on Axes (we could instead use our TextCollection class that's used within), as well as a few other cleanups:
def plot_local_extrema(ax, x, y, extrema_mask, symbol, values=None, fmt='{0:.0f}', **kwargs):
defaultkwargs = {'size': 20, 'clip_on': True, 'clip_box': ax.bbox, 'color': 'black',
'fontweight': 'bold',
'horizontalalignment': 'center', 'verticalalignment': 'center'}
kwargs = {**defaultkwargs, **kwargs}
size = kwargs.pop('size')
if x.ndim == 1 or y.ndim == 1:
x, y = np.meshgrid(x, y)
extreme_x = x[extrema_mask]
extreme_y = y[extrema_mask]
symbol, _ = np.broadcast_arrays(symbol, extreme_x)
if values is None:
return ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
else:
extreme_vals = values[extrema_mask]
valuesize = kwargs.pop('valuesize', size * 0.65)
p1 = ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
p2 = ax.scattertext(extreme_x, extreme_y, [fmt.format(v) for v in extreme_vals],
size=valuesize, loc=(0, -14), **kwargs)
return p1, p2
I'm torn because on one hand this feels a bit specific/fringe, but any further simplification makes it not too much more useful than scattertext() itself. For instance, fmt and symbol could really be elided together. In some ways the most useful thing about that function is automatically meshgridding the x/y coords. Anyone have any other thoughts?
After some discussion with @dcamron, one use case we realized that needed to be covered was that the plotting needs to work for the analyzed highs and lows from the WPC surface bulletins, so including the masking is kinda out. Luckily, getting the appropriate x/y indices is pretty easy and seems to work. Here's an implementation that updates scattertext a bit as a the plotting and has the find_local_extrema return indices along each of the dimensions:
import numbers
from matplotlib import cbook
import numpy as np
from metpy.plots._mpl import TextCollection
def find_local_extrema(var, nsize=15, extrema='max'):
from scipy.ndimage import maximum_filter, minimum_filter
if extrema == 'max':
extreme_val = maximum_filter(var, nsize, mode='nearest')
elif extrema == 'min':
extreme_val = minimum_filter(var, nsize, mode='nearest')
else:
raise ValueError(f'Invalid value for "extrema": {extrema}. '
'Valid options are "max" or "min".')
return (var == extreme_val).nonzero()
def scattertext(ax, x, y, values, loc=(0, 0), fmt='{0:.0f}', **kw):
# Start with default args and update from kw
new_kw = {
'verticalalignment': 'center',
'horizontalalignment': 'center',
'transform': ax.transData,
'clip_on': False}
new_kw.update(kw)
# Handle masked arrays
x, y, values = cbook.delete_masked_points(*np.broadcast_arrays(x, y, values))
print(x, y, values)
# If there is nothing left after deleting the masked points, return None
if x.size == 0:
return None
if not isinstance(values[0], str):
if isinstance(fmt, str):
fmt = fmt.format
values = [fmt(v) for v in values]
# Make the TextCollection object
text_obj = TextCollection(x, y, values, offset=loc, **new_kw)
# The margin adjustment is a hack to deal with the fact that we don't
# want to transform all the symbols whose scales are in points
# to data coords to get the exact bounding box for efficiency
# reasons. It can be done right if this is deemed important.
# Also, only bother with this padding if there is anything to draw.
xm, ym = ax.margins()
if xm < 0.05:
ax.set_xmargin(0.05)
if ym < 0.05:
ax.set_ymargin(0.05)
# Add it to the axes and update range
ax.add_artist(text_obj)
# Matplotlib at least up to 3.2.2 does not properly clip text with paths, so
# work-around by setting to the bounding box of the Axes
# TODO: Remove when fixed in our minimum supported version of matplotlib
text_obj.clipbox = ax.bbox
ax.update_datalim(text_obj.get_datalim(ax.transData))
ax.autoscale_view()
return text_obj
and heres the example using it:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr
data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))
mslp = data.Pressure_reduced_to_MSL_msl.squeeze().metpy.convert_units('hPa')
max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')
fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))
# Plot MSLP
clevmslp = np.arange(800., 1120., 4)
ax.contour(mslp.lon, mslp.lat, mslp,
clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[max_xind], mslp.lat[max_yind], 'H', size=20, color='red',
fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], 'L', size=20, color='blue',
fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], mslp.values[min_yind, min_xind],
size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
@kgoebber How does that seem?
I think this is okay, but I attempted to try it with some NAM data (using geopotential heights) and I am running into an issue with plotting. Is there something in the TextCollection piece that is making it specifically look for lat/lon values? I get a 'NaN' error...
Test Code:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
from metpy.cbook import get_test_data
import matplotlib.pyplot as plt
import xarray as xr
data = xr.open_dataset(get_test_data('NAM_test.nc', as_file_obj=False)).metpy.parse_cf()
mslp = data.Geopotential_height_isobaric.metpy.sel(vertical=850*units.hPa).squeeze()
dataproj = mslp.metpy.cartopy_crs
max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')
fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))
# Plot MSLP
clevmslp = np.arange(0., 2000., 30)
ax.contour(mslp.x, mslp.y, mslp,
clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())
scattertext(ax, mslp.x[max_xind], mslp.y[max_yind], 'H', size=20, color='red',
fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], 'L', size=20, color='blue',
fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)
Well I can get a good plot if I correct your call to contour() and your last call to scattertext to use dataproj rather than ccrs.PlateCarree():
ax.contour(mslp.x, mslp.y, mslp,
clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=dataproj)
...
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
size=12, color='blue', fontweight='bold', loc=(0,-15), transform=dataproj)
Though this is the image:
so there's definitely something weird with the values, not to mention the highs near New Orleans.
EDIT: Ok, so the values are due to using heights, so that makes sense. Still not sure what's up with some of those close locations.
Ok, so the "too close" points are a flaw in the algorithm. Essentially we're using a filter to smear the extrema within the kernel, then using == to find where the original source datapoint is. Unfortunately, there's nothing prohibiting multiple points within that area having the same value.
Thoughts:
- Filter found points and take the centroid of any extrema that are clustered
- A different technique?
D'oh, missed those transforms...shouldn't work that hard at the end of a day...
Okay, so there could be two potential options:
- Institute one smoothing pass with the nine point filter to help ensure unique values to find the extrema points based on the average of the surrounding points, then give back just the boolean array so that the actual gridded values are retrieved at the end.
- Institute some form of
reduce_point_densityto eliminate close by, same value points.
In a test with implementing option 1, it did in fact eliminate the double "H" by New Orleans. There is still a chance that you would end up with two same values close by (or at some other random point). This option is nice in that it doesn't create another setting the user needs to think about. We're only using the smoothing to help isolate the "correct" grid point in a unique way, but not giving back any altered values. Below is the modified function for finding the local extrema.
def find_local_extrema(var, nsize=15, extrema='max'):
from scipy.ndimage import maximum_filter, minimum_filter
from metpy.calc import smooth_n_point
smooth_var = smooth_n_point(var, 9, 1)
if extrema == 'max':
extreme_val = maximum_filter(smooth_var, nsize, mode='nearest')
elif extrema == 'min':
extreme_val = minimum_filter(smooth_var, nsize, mode='nearest')
else:
raise ValueError(f'Invalid value for "extrema": {extrema}. '
'Valid options are "max" or "min".')
return (smooth_var == extreme_val).nonzero()
A downside to the smoothing solution is that it doesn't always get the original extreme values as the extrema smoothed value may be a surrounding point to the actual point. Clearly not ideal. Likely off by at most a rounding level of error in reporting out.
For what it's worth, here is what GEMPAK does...
https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dghilo.c https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclmp.c https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclcn.c
To summarize: its finds min/max points within a radius, then looks for clusters, and keeps the center of the cluster and removes the rest.
So for reference, this is the same as skimage's peak_local_max. Looks like in 0.18 they addressed the "return multiple close points" problem by picking one arbitrarily.
I'm peaking at this post which gives something with some scary math ("Homology") behind it, but seems like a straightforward method that gives the peaks as if the water level around them was dropping. I don't want to overthink this one, but now that I've seen the post my brain won't give it up.
Scipy find_peaks with the distance argument might also work? Will code up an example.
@ThomasMGeo The problem with find_peaks is that it only works with 1D input. I have an implementation of the persistent homology approach that I just need to finish with:
- An implementation of
find_local_extrema(handling max/min and choosing the number of peaks) - Tests
Yep, after fiddling with it, came to the same conclusion :)
I would amend the implementation so that the highs/lows at the grid edge are omitted, in line with what GEMPAK does. I believe the for loops in the GEMPAK code are performing the search on a grid that excludes the first and last row/column.
Punting from this (now December) release because there are some API decisions I don't want to rush on plotting and I'm still struggling with auto-thresholding of the persistent homology approach. Not worth holding up for this.
Sounds good. Happy to test implementation in the new year to help solidify choices for implementation.