MetPy icon indicating copy to clipboard operation
MetPy copied to clipboard

Have smoothing functions return named DataArrays

Open sgdecker opened this issue 2 years ago • 6 comments

What should we add?

Currently, functions like smooth_gaussian return DataArrays with no names. This causes problems if you want to combine DataArrays into a Dataset, as in the following example:

import datetime
import xarray as xr
import metpy.calc as mpcalc
from metpy.io import GempakGrid

plot_time = datetime.datetime(2023, 10, 30, 18)

gem_name = '/ldmdata/gempak/model/nam/23103012_nam211.gem'
gem_data = GempakGrid(gem_name)

u = gem_data.gdxarray(parameter='UREL', date_time=plot_time, level=850)[0]
v = gem_data.gdxarray(parameter='VREL', date_time=plot_time, level=850)[0]

# Fails since smooth_gaussian returns anonymous DataArray
#us = mpcalc.smooth_gaussian(u, 8)
#vs = mpcalc.smooth_gaussian(v, 8)
#wind = xr.merge((us, vs))

us = mpcalc.smooth_gaussian(u, 8).rename(u.name + '_gaussian_smoothed')
vs = mpcalc.smooth_gaussian(v, 8).rename(v.name + '_gaussian_smoothed')
wind = xr.merge((us, vs))

My feature request is that functions like this (where the output is of the same nature as the input), when given DataArrays as input, return DataArrays with reasonable names, akin to what the working version of the code does, but internally, so that users need not use .rename themselves.

Reference

No response

sgdecker avatar Nov 01 '23 19:11 sgdecker

#2369 is another instance of this problem. I think it might be possible to have our xarray handling automatically add a name. The obvious part is including something like '_{func.__name__}'. The question is, how do we leverage the input?

It's obvious in this case, where you could use the name of the 1 xarray input. But what would you do for e.g. vorticity?

dopplershift avatar Nov 01 '23 19:11 dopplershift

For vorticity, I would take my "the output is of the same nature as the input" out and do nothing! But I suppose that isn't ideal, so in cases where the output is based on multiple inputs, perhaps not leveraging the input at all is best. Thus, rather than appending '_{func.__name__}' to an existing name, for vorticity the output name would just be '{func.__name__}'. But maybe the issue is the general xarray handling code doesn't know if it is being called from smooth_gaussian vs vorticity?

sgdecker avatar Nov 01 '23 19:11 sgdecker

Actually, that's a really good point that 1 argument should be var_funcname and multiple should be funcname. The good news is that the way the xarray handling is done, this should be do-able as far as I can see (always possible I'm missing something @jthielen @dcamron ?)

One limitation is that this doesn't help us solve #2369, but that doesn't eliminate this approach.

One outstanding question: Do we have any functions that return multiple values that we need to consider how to handle?

dopplershift avatar Nov 01 '23 19:11 dopplershift

Actually, that's a really good point that 1 argument should be var_funcname and multiple should be funcname. The good news is that the way the xarray handling is done, this should be do-able as far as I can see (always possible I'm missing something @jthielen @dcamron ?)

Looking back at the relevant code, this should definitely be possible, but I suspect it may not be the wisest approach as a uniform principle (e.g., not sure we really want saturation_vapor_pressure returning something like temperature_saturation_vapor_pressure). Perhaps we need to add some kind of rename parameter for the decorator, to cover differing sets of desired behavior? That being said, I unfortunately don't see a way of doing this which doesn't add yet another layer of complexity to the xarray handling.

One outstanding question: Do we have any functions that return multiple values that we need to consider how to handle?

I unfortunately can't recall any examples off the top of my head, but the preprocess_and_wrap decorator very much has handling of multiple return values, so I would assume it would need to be addressed somehow.

jthielen avatar Nov 01 '23 21:11 jthielen

Looking back at the relevant code, this should definitely be possible, but I suspect it may not be the wisest approach as a uniform principle (e.g., not sure we really want saturation_vapor_pressure returning something like temperature_saturation_vapor_pressure). Perhaps we need to add some kind of rename parameter for the decorator, to cover differing sets of desired behavior?

Curses, foiled again. Your suggestion sounds like a good way to progress. It will result in some unfortunate complexity. I suppose this isn't a uniquely MetPy problem. I assume that's because it's not as common for people to put these back into Datasets.

@sgdecker What's the original use case for wanting the data to be present within a Dataset?

dopplershift avatar Nov 01 '23 21:11 dopplershift

@sgdecker What's the original use case for wanting the data to be present within a Dataset?

The immediate use case here was to create a BarbPlot. A more speculative use case would be code that is in a function to create a "QG-ready" Dataset from a GEMPAK file.

@jthielen 's example of temperature_saturation_vapor_pressure gets at the heart of what "of the same nature" means. I wonder if the rule could be, if the output is not a tuple and the units of the output are the same as the units of the first argument, only then is the function name appended to the existing name. In all other cases, the output name is just the function name alone.

sgdecker avatar Nov 01 '23 22:11 sgdecker