MetPy icon indicating copy to clipboard operation
MetPy copied to clipboard

Adds function to find and plot local extrema

Open kgoebber opened this issue 2 years ago • 15 comments

Description Of Changes

This PR adds a function to find local maximum and minimum using the spicy.ndimage.maximum_filter and spicy.ndimage.minimum_filter and a separate function to easily plot extrema on a map. The plotting sets up some sensible defaults while allowing for use of kwargs to set matplotlib.pyplot.Text settings.

I haven't yet incorporated the PathEffects suggestion from #1234, and implemented in #1237, but can easily do so once we are sure about current implementation, if desired. This PR would supersede #1237.

The operationalizes example code that has traditionally lived in the MetPy Example Gallery.

Checklist

  • [x] Closes #1234
  • [x] Tests added
  • [x] Fully documented

kgoebber avatar Jul 09 '23 21:07 kgoebber

I think the test failure is caused by minimum having a 3 1/2 year old version of xarray, which we could consider updating...but see my incoming review first.

dopplershift avatar Jul 21 '23 20:07 dopplershift

I put together a version of this that uses the ax.scattertext() method we monkeypatch (😬) on Axes (we could instead use our TextCollection class that's used within), as well as a few other cleanups:

def plot_local_extrema(ax, x, y, extrema_mask, symbol, values=None, fmt='{0:.0f}', **kwargs):
    defaultkwargs = {'size': 20, 'clip_on': True, 'clip_box': ax.bbox, 'color': 'black',
                     'fontweight': 'bold',
                     'horizontalalignment': 'center', 'verticalalignment': 'center'}
    kwargs = {**defaultkwargs, **kwargs}
    size = kwargs.pop('size')

    if x.ndim == 1 or y.ndim == 1:
        x, y = np.meshgrid(x, y)
    extreme_x = x[extrema_mask]
    extreme_y = y[extrema_mask]
    symbol, _ = np.broadcast_arrays(symbol, extreme_x)

    if values is None:
        return ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
    else:
        extreme_vals = values[extrema_mask]
        valuesize = kwargs.pop('valuesize', size * 0.65)
        p1 = ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
        p2 = ax.scattertext(extreme_x, extreme_y, [fmt.format(v) for v in extreme_vals],
                            size=valuesize, loc=(0, -14), **kwargs)
        return p1, p2

I'm torn because on one hand this feels a bit specific/fringe, but any further simplification makes it not too much more useful than scattertext() itself. For instance, fmt and symbol could really be elided together. In some ways the most useful thing about that function is automatically meshgridding the x/y coords. Anyone have any other thoughts?

dopplershift avatar Jul 24 '23 21:07 dopplershift

After some discussion with @dcamron, one use case we realized that needed to be covered was that the plotting needs to work for the analyzed highs and lows from the WPC surface bulletins, so including the masking is kinda out. Luckily, getting the appropriate x/y indices is pretty easy and seems to work. Here's an implementation that updates scattertext a bit as a the plotting and has the find_local_extrema return indices along each of the dimensions:

import numbers

from matplotlib import cbook
import numpy as np
from metpy.plots._mpl import TextCollection

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter

    if extrema == 'max':
        extreme_val = maximum_filter(var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (var == extreme_val).nonzero()


def scattertext(ax, x, y, values, loc=(0, 0), fmt='{0:.0f}', **kw):
    # Start with default args and update from kw
    new_kw = {
        'verticalalignment': 'center',
        'horizontalalignment': 'center',
        'transform': ax.transData,
        'clip_on': False}
    new_kw.update(kw)

    # Handle masked arrays
    x, y, values = cbook.delete_masked_points(*np.broadcast_arrays(x, y, values))
    print(x, y, values)

    # If there is nothing left after deleting the masked points, return None
    if x.size == 0:
        return None

    if not isinstance(values[0], str):
        if isinstance(fmt, str):
            fmt = fmt.format
        values = [fmt(v) for v in values]
    
    # Make the TextCollection object
    text_obj = TextCollection(x, y, values, offset=loc, **new_kw)

    # The margin adjustment is a hack to deal with the fact that we don't
    # want to transform all the symbols whose scales are in points
    # to data coords to get the exact bounding box for efficiency
    # reasons.  It can be done right if this is deemed important.
    # Also, only bother with this padding if there is anything to draw.
    xm, ym = ax.margins()
    if xm < 0.05:
        ax.set_xmargin(0.05)

    if ym < 0.05:
        ax.set_ymargin(0.05)

    # Add it to the axes and update range
    ax.add_artist(text_obj)

    # Matplotlib at least up to 3.2.2 does not properly clip text with paths, so
    # work-around by setting to the bounding box of the Axes
    # TODO: Remove when fixed in our minimum supported version of matplotlib
    text_obj.clipbox = ax.bbox

    ax.update_datalim(text_obj.get_datalim(ax.transData))
    ax.autoscale_view()
    return text_obj

and heres the example using it:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))
mslp = data.Pressure_reduced_to_MSL_msl.squeeze().metpy.convert_units('hPa')

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(800., 1120., 4)
ax.contour(mslp.lon, mslp.lat, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.lon[max_xind], mslp.lat[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

@kgoebber How does that seem?

dopplershift avatar Jul 28 '23 23:07 dopplershift

I think this is okay, but I attempted to try it with some NAM data (using geopotential heights) and I am running into an issue with plotting. Is there something in the TextCollection piece that is making it specifically look for lat/lon values? I get a 'NaN' error...

Test Code:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from metpy.cbook import get_test_data
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('NAM_test.nc', as_file_obj=False)).metpy.parse_cf()
mslp = data.Geopotential_height_isobaric.metpy.sel(vertical=850*units.hPa).squeeze()
dataproj = mslp.metpy.cartopy_crs

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(0., 2000., 30)
ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.x[max_xind], mslp.y[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

kgoebber avatar Aug 01 '23 22:08 kgoebber

Well I can get a good plot if I correct your call to contour() and your last call to scattertext to use dataproj rather than ccrs.PlateCarree():

ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=dataproj)
...
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=dataproj)

Though this is the image:

image

so there's definitely something weird with the values, not to mention the highs near New Orleans.

EDIT: Ok, so the values are due to using heights, so that makes sense. Still not sure what's up with some of those close locations.

dopplershift avatar Aug 01 '23 22:08 dopplershift

Ok, so the "too close" points are a flaw in the algorithm. Essentially we're using a filter to smear the extrema within the kernel, then using == to find where the original source datapoint is. Unfortunately, there's nothing prohibiting multiple points within that area having the same value.

Thoughts:

  1. Filter found points and take the centroid of any extrema that are clustered
  2. A different technique?

dopplershift avatar Aug 01 '23 22:08 dopplershift

D'oh, missed those transforms...shouldn't work that hard at the end of a day...

Okay, so there could be two potential options:

  1. Institute one smoothing pass with the nine point filter to help ensure unique values to find the extrema points based on the average of the surrounding points, then give back just the boolean array so that the actual gridded values are retrieved at the end.
  2. Institute some form of reduce_point_density to eliminate close by, same value points.

In a test with implementing option 1, it did in fact eliminate the double "H" by New Orleans. There is still a chance that you would end up with two same values close by (or at some other random point). This option is nice in that it doesn't create another setting the user needs to think about. We're only using the smoothing to help isolate the "correct" grid point in a unique way, but not giving back any altered values. Below is the modified function for finding the local extrema.

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter
    from metpy.calc import smooth_n_point

    smooth_var = smooth_n_point(var, 9, 1)
    if extrema == 'max':
        extreme_val = maximum_filter(smooth_var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(smooth_var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (smooth_var == extreme_val).nonzero()

download

A downside to the smoothing solution is that it doesn't always get the original extreme values as the extrema smoothed value may be a surrounding point to the actual point. Clearly not ideal. Likely off by at most a rounding level of error in reporting out.

kgoebber avatar Aug 02 '23 14:08 kgoebber

For what it's worth, here is what GEMPAK does...

https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dghilo.c https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclmp.c https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclcn.c

To summarize: its finds min/max points within a radius, then looks for clusters, and keeps the center of the cluster and removes the rest.

kgoebber avatar Aug 02 '23 14:08 kgoebber

So for reference, this is the same as skimage's peak_local_max. Looks like in 0.18 they addressed the "return multiple close points" problem by picking one arbitrarily.

I'm peaking at this post which gives something with some scary math ("Homology") behind it, but seems like a straightforward method that gives the peaks as if the water level around them was dropping. I don't want to overthink this one, but now that I've seen the post my brain won't give it up.

dopplershift avatar Aug 02 '23 19:08 dopplershift

Scipy find_peaks with the distance argument might also work? Will code up an example.

ThomasMGeo avatar Aug 07 '23 16:08 ThomasMGeo

@ThomasMGeo The problem with find_peaks is that it only works with 1D input. I have an implementation of the persistent homology approach that I just need to finish with:

  1. An implementation of find_local_extrema (handling max/min and choosing the number of peaks)
  2. Tests

dopplershift avatar Aug 07 '23 17:08 dopplershift

Yep, after fiddling with it, came to the same conclusion :)

ThomasMGeo avatar Aug 07 '23 17:08 ThomasMGeo

I would amend the implementation so that the highs/lows at the grid edge are omitted, in line with what GEMPAK does. I believe the for loops in the GEMPAK code are performing the search on a grid that excludes the first and last row/column.

sgdecker avatar Oct 30 '23 19:10 sgdecker

Punting from this (now December) release because there are some API decisions I don't want to rush on plotting and I'm still struggling with auto-thresholding of the persistent homology approach. Not worth holding up for this.

dopplershift avatar Dec 22 '23 20:12 dopplershift

Sounds good. Happy to test implementation in the new year to help solidify choices for implementation.

kgoebber avatar Dec 22 '23 21:12 kgoebber