MetPy icon indicating copy to clipboard operation
MetPy copied to clipboard

Raise error when requested level does not exist in field used for declarative plot

Open sgdecker opened this issue 2 years ago • 7 comments

What should we add?

The code below contains a fairly subtle bug, and the plot it generates is not the intended plot. (Comment out the "incorrect line" and uncomment out the "correct line" to get the correct plot.)

from datetime import datetime, timedelta

import xarray as xr
from xarray.backends import NetCDF4DataStore
from siphon.catalog import TDSCatalog
from metpy.units import units
import metpy.calc as mpcalc
from metpy.plots import ContourPlot, BarbPlot, MapPanel, PanelContainer


def get_nam212(init_time, valid_time):
    """Obtain NAM data on the 212 grid."""
    ymd = init_time.strftime('%Y%m%d')
    hr = init_time.strftime('%H')
    filename = f'{ymd}_{hr}00.grib2'
    ds_name = 'NAM_CONUS_40km_conduit_' + filename
    cat_name = ('https://thredds.ucar.edu/thredds/catalog/grib/NCEP/NAM/'
                'CONUS_40km/conduit/' + ds_name + '/catalog.xml')

    cat = TDSCatalog(cat_name)
    ds = cat.datasets[ds_name]
    ncss = ds.subset()
    query = ncss.query()
    query.time(valid_time).variables('all')
    nc = ncss.get_data(query)
    data = xr.open_dataset(NetCDF4DataStore(nc)).metpy.parse_cf()
    return data


init_time = datetime(2023, 1, 26, 12)
plot_time = init_time + timedelta(hours=6)

nam = get_nam212(init_time, plot_time)

hgt = mpcalc.smooth_gaussian(nam['Geopotential_height_isobaric'], 6).squeeze().rename('hgt')
tmp = mpcalc.smooth_gaussian(nam['Temperature_isobaric'], 6).squeeze().rename('tmp')
ugrd = mpcalc.smooth_gaussian(nam['u-component_of_wind_isobaric'], 6).squeeze().rename('ugrd')
vgrd = mpcalc.smooth_gaussian(nam['v-component_of_wind_isobaric'], 6).squeeze().rename('vgrd')
spfh = mpcalc.smooth_gaussian(nam['Specific_humidity_isobaric'], 6).squeeze().rename('spfh')

isen_lev = [292] * units('K')

nam_isen = mpcalc.isentropic_interpolation_as_dataset(isen_lev, tmp, hgt,
                                                      ugrd, vgrd, spfh,
                                                      bottom_up_search=False)

# Correct line
#nam_isen['mixr'] = mpcalc.mixing_ratio_from_specific_humidity(nam_isen['spfh'])

# Incorrect Line
nam_isen['mixr'] = mpcalc.mixing_ratio_from_specific_humidity(spfh)

cp = ContourPlot()
cp.data = nam_isen
cp.field = 'mixr'
cp.level = 292
cp.contours = range(0, 20)
cp.plot_units = 'g/kg'
cp.clabels = True

bp = BarbPlot()
bp.data = nam_isen
bp.field = ['ugrd', 'vgrd']
bp.level = 292
bp.earth_relative = False
bp.skip = (8, 6)
bp.plot_units = 'kt'

mp = MapPanel()
mp.title = f'{plot_time} 292-K Analysis'
mp.plots = [cp, bp]

pc = PanelContainer()
pc.size = (22, 17)
pc.panels = [mp]
pc.show()

However, I'm surprised that the erroneous code generates a plot at all, given that 292 is not a valid level for the 'mixr' field (i.e., DataArray). Throwing an error would have made the mistake much easier to spot, so my feature request is that this code generate an error rather than plot something.

Reference

No response

sgdecker avatar Feb 01 '23 21:02 sgdecker

Subsetting in Declarative uses DataArray.sel(method='nearest') under the hood, which is why it's going to give you something. We should at least document this.

We can consider changing the default behavior or exposing the selection method, but I'm not sure if any other discussion went into the initial decision (seems it goes back to the original declarative PR @dopplershift.) Beyond that, the persistent "how granular should declarative be" looms.

edit: if you provide units in the label-based selection, this will at least fail at the "totally incompatible values 292 K and isobaric2 in Pa" stage in this example. That wouldn't save you in a similar situation with compatible units however.

dcamron avatar Feb 01 '23 21:02 dcamron

Xarray's .sel() method has a tolerance parameter we could set here to try to make sure we return sensible matches. Doing this in general (across all vertical coordinate options) might be tricky, but we could use a default of 1% of the requested value?

dopplershift avatar Feb 01 '23 22:02 dopplershift

I think the challenge with the use of tolerance will be that this subset is operating on both vertical level and time, which will have greatly different values for tolerance, if I am readying the xarray documentation correctly. I imagine that the nearest method was included here primary for the time element because with satellite data you may not have the exact time (say 2023-02-01 at 12 UTC) because of the time variable is not refactored as would happen for GEMPAK formatted files.

So I'm wondering if we should handle the subsetting of time and vertical separately so we could be more lenient on one (e.g., using nearest method) and one where we are looking for an exact level?

kgoebber avatar Feb 02 '23 19:02 kgoebber

@kgoebber good point. We could certainly reimplement with two calls to .sel(). I think that will have minimal performance impact.

dopplershift avatar Feb 02 '23 20:02 dopplershift

Yes, I was thinking that being forgiving with respect to time is essential, but I couldn't think of a case where I don't want to specify an exact vertical level. I haven't had a chance to experiment with this yet, but I was also wondering if adding units would help in the case of my example code, since the intended level was 292 K, but it actually plotted whatever pressure level is closest to 292 Pa. In other words, would a unit mismatch trigger an error in this case?

sgdecker avatar Feb 02 '23 21:02 sgdecker

@sgdecker Yes, the units should trigger an error in that case.

dopplershift avatar Feb 02 '23 21:02 dopplershift

I can confirm that if I use cp.level = 292 * units('degK') that does trigger a DimensionalityError when the "incorrect line" is used, so that is good.

sgdecker avatar Feb 02 '23 21:02 sgdecker