[Feature]: Consider adding vertical regridding feature to reconstruct pressure from hybrid
Is your feature request related to a problem?
Related to #45 and #388.
Discussed in https://github.com/xCDAT/xcdat/discussions/440
Originally posted by tomvothecoder March 27, 2023
Hi @xCDAT/core-developers or anybody in the xCDAT community. Have any of you come across a package that implements an xarray-based function similar to cdutil.vertical.reconstructPressureFromHybrid? I am working on refactoring e3sm_diags and e3sm_to_cmip to use xarray/xCDAT and there are references to this function that I need to replace. Thanks!
Docstring:
def reconstructPressureFromHybrid(ps, A, B, Po):
"""
Reconstruct the Pressure field on sigma levels, from the surface pressure
:param Ps: Surface pressure
:param A: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po.
:param B: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po.
:param Po: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po
:param Ps: surface pressure
.. note::
A and B are 1d sigma levels.
Po and Ps must have same units.
:returns: Pressure field, such as P=B*Ps+A*Po.
:Example:
.. doctest:: vertical_reconstructPressureFromHybrid
>>> P=reconstructPressureFromHybrid(ps,A,B,Po)
"""
...
Examples Usages:
- https://github.com/E3SM-Project/e3sm_diags/blob/4a82b0ec3b2a81233b294fa505c6aa8228ecbd27/e3sm_diags/driver/utils/general.py#L132-L146
- https://github.com/E3SM-Project/e3sm_to_cmip/blob/70f7dad381ef304b022826b540df0e7b46ba9912/e3sm_to_cmip/cmor_handlers/vars/pfull.py#L38-L47
- https://github.com/E3SM-Project/e3sm_to_cmip/blob/70f7dad381ef304b022826b540df0e7b46ba9912/e3sm_to_cmip/cmor_handlers/vars/phalf.py#L39-L50
Describe the solution you'd like
First, check of any other xarray-based package includes this feature.
- If exist - check if the implementation meets our requirements (based on
cdutil.vertical.reconstructPressureFromHybrid)- If meets our requirements -- just use this library API instead
- If does not meet our requirements -- open a new PR after #388 is merged
- If none exist -- open a new PR after #388 is merged
Describe alternatives you've considered
No response
Additional context
No response
Hey @jasonb5, can you follow the "Describe the solution you'd like" and see what we should do? If no API exists in another xarray-based package, I'm hoping to have this available by the next release around end of May.
e3sm_to_cmip and e3sm_diags are being refactored to use xarray/xcdat and both use cdutil.vertical.reconstructPressureFromHybrid() which blocks progress for these efforts.
@tomvothecoder I'll check this out and see if I can find anything existing otherwise I'll go off the original plan I had for this in #388.
Thanks @jasonb5!
I would happily merge this in cf-xarray. We already support a few ocean parametric coordinates. The code would go here https://github.com/xarray-contrib/cf-xarray/blob/c36a3687b3820176c394580f86a67ca53b9cadbd/cf_xarray/accessor.py#L2507
Hey @dcherian, yes @jasonb5 came across that feature in cf-xarray! He mentioned it might be a good opportunity to make a PR to cf-xarray to support decoding other parametric vertical coordinates.
Ideally, we'd like to have this feature out relatively soon so we can start refactoring some other packages with it. If cf-xarray has a quick turnaround time for releases, we can definitely contribute with a PR. Alternatively, we can implement this feature in xCDAT in the interim and then port it over to cf-xarray at a later date.
If cf-xarray has a quick turnaround time for releases
Yes, at the moment I just release when there's some nice new feature.
Hey @jasonb5 and @chengzhuzhang, I found these features from geocat-comp. Can you take a look and see if it fits our requirements so we don't end up duplicating functionalities?
- https://geocat-comp.readthedocs.io/en/latest/user_api/generated/geocat.comp.interpolation.interp_hybrid_to_pressure.html
- https://geocat-comp.readthedocs.io/en/latest/user_api/generated/geocat.comp.interpolation.interp_sigma_to_hybrid.html
Also MetPy has the log_interpolate_1d function:
- https://unidata.github.io/MetPy/latest/examples/sigma_to_pressure_interpolation.html#sphx-glr-examples-sigma-to-pressure-interpolation-py
- https://github.com/Unidata/MetPy/issues/1037
In testing xcdat v0.6.0, I wanted to be able to create the pressure coordinate from hybrid (generically across a bunch of models). I ended up creating the following function, which takes the embedded formula (e.g., ds.lev.formula) and converts it into a statement that can be executed by xarray (e.g., interprete_formula(ds.lev.formula) yields p = ds['a']*ds['p0']+ds['b']*ds['ps'] where ds.lev.formula = p = a*p0 + b*ps). So you can then do:
f = ds.plev.formula
fstring = interprete_formula(f)
exec(f)
print(p)
<xarray.DataArray (lev: 91, time: 1980, lat: 128, lon: 256)> ... Coordinates:
- lev (lev) float64 0.9988 0.9959 0.992 ... 5.61e-05 2.951e-05 9.869e-06
- lat (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
- lon (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
- time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
This is a clunky function (though it so far appears to be working on CMIP data...). I am absolutely sure you developers can come up with a better method for interpreting the hybrid formula on the fly, but I thought I'd put it here in case it is useful. I also recognize using exec/eval is not a best practice...maybe this can be avoided somehow (I have ideas if anyone ends up working on this).
def interprete_formula(f):
operator_chars = ['*', '(', ')', '+', '-', '/']
# first find/remove all parenthesis that are specifying axes
# p(n,k,j,i) = ap(k) + b(k)*ps(n,j,i) --> p = ap + b*ps
# get formula indices corresponding to a left hand parenthesis (
pinds = [i for i in range(len(f)) if f[i] == '(']
# initialize list to replace sections of formula (e.g., '(i,j,k)' --> '')
replace = []
# loop over left hand parenthesis instances
for p in pinds:
pref = '' # initialize parenthesis string
stopsearch = False # set stop flag to false
# loop from left hand parentheis to end of formula
for i in range(p, len(f)):
# if stopsearch, continue
if stopsearch:
continue
# get indexed character
c = f[i]
# check for close parenthesis )
# if closed, stopsearch
if c == ')':
pref += c # add character to parenthesis string
stopsearch = True
else:
pref += c # add character to parenthesis string
# check for arithmetic operators inside parenthesis string
ocs = [c for c in pref[1:-1] if c in operator_chars]
# if no arithmetic operators, queue parenthesis string for removal
if len(ocs) == 0:
replace.append(pref)
# remove all parenthesis strings with no arithmetic operators
for r in replace:
f = f.replace(r, '')
# further process formula
fout = '' # initialize blank formula
vid = '' # initialize blank variable
lhs = f.split('=')[0] # get lhs
f = f.split('=')[1] # get rhs
# loop over each char
for s in f:
# ignore blank spaces
if s == ' ':
continue
# if arithmetic operator, need to add to formula
if s in operator_chars:
# if a variable is not empty, write it out first
if len(vid) > 0:
fout = fout + 'ds["' + vid + '"]'
vid = '' # reset variable accumulator
# write out arithmetic operator
fout = fout + s
else:
# char must be part of a variable - add to variable accumulator
vid += s
# finished looping over all characters
# write out variable to formula
fout = fout + 'ds["' + vid + '"]'
# add lhs
fout = lhs + ' = ' + fout
# return formula
return fout
@pochedls @tomvothecoder Created PR in the cf_xarray repo https://github.com/xarray-contrib/cf-xarray/pull/528.
I think I have this started correctly. @jasonb5 – would decode_vertical_coords have worked for atmosphere_hybrid_sigma_pressure_coordinate before this PR (it is working now, which I assume is because of your PR)?
I pulled your PR (in my normal xcdat environment) import cf-xarray and cf.__path__ shows that I'm pointing to your PR). I then import xcdat (which I believe retains the updated cf-xarray import...).
Syntax for testers (i.e., note to self):
fn = '/p/css03/esgf_publish/CMIP6/CMIP/NASA-GISS/GISS-E3-G/amip/r1i1p3f1/CFmon/ta/gr/v20220707/ta_CFmon_GISS-E3-G_amip_r1i1p3f1_gr_197801-201412.nc'
ds = xc.open_mfdataset(fn)
ds.cf.decode_vertical_coords(outnames={'lev': 'plev'})
ds.plev
Summary
I made some progress on this. The main finding is that models with the formula p = ap + b*ps fail to decode (sometimes with an error and sometimes silently). I think this results because under CF conventions, vertical coordinates with the standard_name : atmosphere_hybrid_sigma_pressure_coordinate have two acceptable formulas (with different fields needed): p = ap + b*ps and p = a*p0 + b*ps.
I've broken the problem into two main chunks ("Decode Error" and "Silent Failures"):
Decode Error
The following models produce an error (Required terms p0 are absent in the dataset.): IPSL* / Can* / MPI* / EC-Earth3-AerChem. These models have standard_name : atmosphere_hybrid_sigma_pressure_coordinate with formula p = ap + b*ps.
UKESM* and HadGEM3-GC31-LL produce KeyError: 'standard_name' and have standard_name : atmosphere_hybrid_height_coordinate with formula z = a + b*orog.
Silent failures
A number of models silently fail (i.e., ds.cf.decode_vertical_coords(outnames={'lev': 'plev'}) executes, but does not produce a plev DataArray): CNRM-CM6-1, CNRM-ESM2-1, GFDL-CM4, IPSL-CM6A-MR1. These datasets have standard_name : atmosphere_hybrid_sigma_pressure_coordinate and formula p = ap + b*ps. They don't have p0 in the dataset, but for some reason don't produce the error as above.
CESM2-WACCM and CESM2 also silently fail with standard_name : alevel (and no listed formula).
IPSL-CM6A-LR also silently fails. It has vertical coordinate presnivs with standard name : 'Vertical levels' and no formula.
Dataset Listings
Datasets with no apparent problem
CNRM-CM5: /p/css03/cmip5_css01/data/cmip5/output1/CNRM-CERFACS/CNRM-CM5/amip/mon/atmos/cfMon/r1i1p1/v20111006/ta/ bcc-csm1-1-m: /p/css03/cmip5_css01/data/cmip5/output1/BCC/bcc-csm1-1-m/amip/mon/atmos/cfMon/r1i1p1/v20120910/ta/ GFDL-CM3: /p/css03/esgf_publish/cmip5/output1/NOAA-GFDL/GFDL-CM3/amip/mon/atmos/cfMon/r1i1p1/v20110601/ta/ CESM1-CAM5: /p/css03/cmip5_css01/data/cmip5/output1/NSF-DOE-NCAR/CESM1-CAM5/amip/mon/atmos/cfMon/r2i1p1/v20121129/ta/ HadGEM2-A: /p/css03/cmip5_css02/data/cmip5/output1/MOHC/HadGEM2-A/amip/mon/atmos/cfMon/r1i1p1/ta/1/ inmcm4: /p/css03/cmip5_css02/data/cmip5/output1/INM/inmcm4/amip/mon/atmos/cfMon/r1i1p1/ta/1/ MIROC5: /p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/amip/mon/atmos/cfMon/r1i1p1/ta/1/ MRI-CGCM3: /p/css03/esgf_publish/cmip5/output1/MRI/MRI-CGCM3/amip/mon/atmos/cfMon/r1i1p1/v20131011/ta/ CCSM4: /p/css03/esgf_publish/cmip5/gdo2/cmip5/output1/NCAR/CCSM4/amip/mon/atmos/cfMon/r7i1p1/v20121031/ta/ INM-CM5-0: /p/css03/esgf_publish/CMIP6/CMIP/INM/INM-CM5-0/amip/r1i1p1f1/CFmon/ta/gr1/v20190610/ INM-CM4-8: /p/css03/esgf_publish/CMIP6/CMIP/INM/INM-CM4-8/amip/r1i1p1f1/CFmon/ta/gr1/v20190528/ TaiESM1: /p/css03/esgf_publish/CMIP6/CMIP/AS-RCEC/TaiESM1/amip/r1i1p1f1/CFmon/ta/gn/v20200817/ MRI-ESM2-0: /p/css03/esgf_publish/CMIP6/CMIP/MRI/MRI-ESM2-0/amip/r2i1p1f1/CFmon/ta/gn/v20190722/ MIROC-ES2L: /p/css03/esgf_publish/CMIP6/CMIP/MIROC/MIROC-ES2L/amip/r3i1p1f2/CFmon/ta/gn/v20200903/ MIROC6: /p/css03/esgf_publish/CMIP6/CMIP/MIROC/MIROC6/amip/r2i1p1f1/CFmon/ta/gn/v20190311/ NorESM2-LM: /p/css03/esgf_publish/CMIP6/CMIP/NCC/NorESM2-LM/amip/r1i1p2f1/CFmon/ta/gn/v20210319/ BCC-CSM2-MR: /p/css03/esgf_publish/CMIP6/CFMIP/BCC/BCC-CSM2-MR/amip/r1i1p1f1/CFmon/ta/gn/v20190801/
Dataset with an error on ds.cf.decode_vertical_coords(outnames={'lev': 'plev'})
IPSL-CM5B-LR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5B-LR/amip/mon/atmos/cfMon/r1i1p1/v20120526/ta/ IPSL-CM5A-MR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5A-MR/amip/mon/atmos/cfMon/r1i1p1/v20120804/ta/ IPSL-CM5A-LR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5A-LR/amip/mon/atmos/cfMon/r1i1p1/v20111119/ta/ MPI-ESM-LR: /p/css03/esgf_publish/cmip5/output1/MPI-M/MPI-ESM-LR/amip/mon/atmos/cfMon/r1i1p1/v20120215/ta/ CanAM4: /p/css03/cmip5_css02/data/cmip5/output1/CCCma/CanAM4/amip/mon/atmos/cfMon/r1i1p1/ta/1/ UKESM1-0-LL: /p/css03/esgf_publish/CMIP6/CMIP/MOHC/UKESM1-0-LL/amip/r1i1p1f4/CFmon/ta/gn/v20210817/ HadGEM3-GC31-LL: /p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/amip/r5i1p1f3/CFmon/ta/gn/v20210427/ CanESM5: /p/css03/esgf_publish/CMIP6/CMIP/CCCma/CanESM5/amip/r2i1p1f1/CFmon/ta/gn/v20190429/ MPI-ESM-1-2-HAM: /p/css03/esgf_publish/CMIP6/CMIP/HAMMOZ-Consortium/MPI-ESM-1-2-HAM/amip/r2i1p1f1/CFmon/ta/gn/v20190628/ EC-Earth3-AerChem: /p/css03/esgf_publish/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-AerChem/amip/r1i1p1f1/CFmon/ta/gr/v20200910/ MPI-ESM1-2-LR: /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/amip/r2i1p1f1/CFmon/ta/gn/v20190815/ MPI-ESM1-2-HR: /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/amip/r2i1p1f1/CFmon/ta/gn/v20190710/
Datasets with no error, but do not produce a plev field
CNRM-CM6-1: /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/amip/r1i1p1f2/CFmon/ta/gr/v20181203/ CNRM-ESM2-1: /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/amip/r1i1p1f2/CFmon/ta/gr/v20181205/ CESM2-WACCM: /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-WACCM/amip/r2i1p1f1/CFmon/ta/gn/v20190220/ CESM2: /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/amip/r1i1p1f1/CFmon/ta/gn/v20190218/ GFDL-CM4: /p/css03/esgf_publish/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/amip/r1i1p1f1/CFmon/ta/gr1/v20180701/ IPSL-CM6A-MR1: /p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM6A-MR1/amip/r1i1p1f1/CFmon/ta/gr/v20230220/ IPSL-CM6A-LR: /p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/amip/r2i1p1f1/CFmon/ta/gr/v20181109/
Now that is amazing.
Given
The choice of whether a(k) or ap(k) is used depends on model formulation; the former is a dimensionless fraction, the latter a pressure value.
perhaps we can check units to decide which representation to use.
FYI – I had focused on testing atmospheric vertical coordinates. Ideally we could get some help testing ocean coordinates (@durack1?); otherwise, this might take awhile to stress test. Although I think CMIP data I've used is mostly on depth coordinates: are there CMIP ocean variables that are typically on coordinate systems that would need to be "decoded" (ocean sigma, s-coordinate, etc.)?