xcdat icon indicating copy to clipboard operation
xcdat copied to clipboard

[Feature]: Consider adding vertical regridding feature to reconstruct pressure from hybrid

Open tomvothecoder opened this issue 2 years ago • 23 comments

Is your feature request related to a problem?

Related to #45 and #388.

Discussed in https://github.com/xCDAT/xcdat/discussions/440

Originally posted by tomvothecoder March 27, 2023 Hi @xCDAT/core-developers or anybody in the xCDAT community. Have any of you come across a package that implements an xarray-based function similar to cdutil.vertical.reconstructPressureFromHybrid? I am working on refactoring e3sm_diags and e3sm_to_cmip to use xarray/xCDAT and there are references to this function that I need to replace. Thanks!

Docstring:

def reconstructPressureFromHybrid(ps, A, B, Po):
    """
    Reconstruct the Pressure field on sigma levels, from the surface pressure
    :param Ps: Surface pressure
    :param A: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po.
    :param B: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po.
    :param Po: Hybrid Conversion Coefficient, such as: p=B.ps+A.Po
    :param Ps: surface pressure
    .. note::
        A and B are 1d sigma levels.
        Po and Ps must have same units.
    :returns: Pressure field, such as P=B*Ps+A*Po.
    :Example:
        .. doctest:: vertical_reconstructPressureFromHybrid
            >>> P=reconstructPressureFromHybrid(ps,A,B,Po)
    """
	...

Examples Usages:

  • https://github.com/E3SM-Project/e3sm_diags/blob/4a82b0ec3b2a81233b294fa505c6aa8228ecbd27/e3sm_diags/driver/utils/general.py#L132-L146
  • https://github.com/E3SM-Project/e3sm_to_cmip/blob/70f7dad381ef304b022826b540df0e7b46ba9912/e3sm_to_cmip/cmor_handlers/vars/pfull.py#L38-L47
  • https://github.com/E3SM-Project/e3sm_to_cmip/blob/70f7dad381ef304b022826b540df0e7b46ba9912/e3sm_to_cmip/cmor_handlers/vars/phalf.py#L39-L50

Describe the solution you'd like

First, check of any other xarray-based package includes this feature.

  • If exist - check if the implementation meets our requirements (based on cdutil.vertical.reconstructPressureFromHybrid)
    • If meets our requirements -- just use this library API instead
    • If does not meet our requirements -- open a new PR after #388 is merged
  • If none exist -- open a new PR after #388 is merged

Describe alternatives you've considered

No response

Additional context

No response

tomvothecoder avatar Apr 06 '23 16:04 tomvothecoder

Hey @jasonb5, can you follow the "Describe the solution you'd like" and see what we should do? If no API exists in another xarray-based package, I'm hoping to have this available by the next release around end of May.

e3sm_to_cmip and e3sm_diags are being refactored to use xarray/xcdat and both use cdutil.vertical.reconstructPressureFromHybrid() which blocks progress for these efforts.

tomvothecoder avatar Apr 06 '23 16:04 tomvothecoder

@tomvothecoder I'll check this out and see if I can find anything existing otherwise I'll go off the original plan I had for this in #388.

jasonb5 avatar Apr 07 '23 00:04 jasonb5

Thanks @jasonb5!

tomvothecoder avatar Apr 11 '23 16:04 tomvothecoder

I would happily merge this in cf-xarray. We already support a few ocean parametric coordinates. The code would go here https://github.com/xarray-contrib/cf-xarray/blob/c36a3687b3820176c394580f86a67ca53b9cadbd/cf_xarray/accessor.py#L2507

dcherian avatar Apr 15 '23 04:04 dcherian

Hey @dcherian, yes @jasonb5 came across that feature in cf-xarray! He mentioned it might be a good opportunity to make a PR to cf-xarray to support decoding other parametric vertical coordinates.

Ideally, we'd like to have this feature out relatively soon so we can start refactoring some other packages with it. If cf-xarray has a quick turnaround time for releases, we can definitely contribute with a PR. Alternatively, we can implement this feature in xCDAT in the interim and then port it over to cf-xarray at a later date.

tomvothecoder avatar Apr 17 '23 22:04 tomvothecoder

If cf-xarray has a quick turnaround time for releases

Yes, at the moment I just release when there's some nice new feature.

dcherian avatar Apr 18 '23 02:04 dcherian

Hey @jasonb5 and @chengzhuzhang, I found these features from geocat-comp. Can you take a look and see if it fits our requirements so we don't end up duplicating functionalities?

  • https://geocat-comp.readthedocs.io/en/latest/user_api/generated/geocat.comp.interpolation.interp_hybrid_to_pressure.html
  • https://geocat-comp.readthedocs.io/en/latest/user_api/generated/geocat.comp.interpolation.interp_sigma_to_hybrid.html

Also MetPy has the log_interpolate_1d function:

  • https://unidata.github.io/MetPy/latest/examples/sigma_to_pressure_interpolation.html#sphx-glr-examples-sigma-to-pressure-interpolation-py
  • https://github.com/Unidata/MetPy/issues/1037

tomvothecoder avatar May 25 '23 19:05 tomvothecoder

In testing xcdat v0.6.0, I wanted to be able to create the pressure coordinate from hybrid (generically across a bunch of models). I ended up creating the following function, which takes the embedded formula (e.g., ds.lev.formula) and converts it into a statement that can be executed by xarray (e.g., interprete_formula(ds.lev.formula) yields p = ds['a']*ds['p0']+ds['b']*ds['ps'] where ds.lev.formula = p = a*p0 + b*ps). So you can then do:

f = ds.plev.formula
fstring = interprete_formula(f)
exec(f)
print(p)

<xarray.DataArray (lev: 91, time: 1980, lat: 128, lon: 256)> ... Coordinates:

  • lev (lev) float64 0.9988 0.9959 0.992 ... 5.61e-05 2.951e-05 9.869e-06
  • lat (lat) float64 -88.93 -87.54 -86.14 -84.74 ... 86.14 87.54 88.93
  • lon (lon) float64 0.0 1.406 2.812 4.219 ... 354.4 355.8 357.2 358.6
  • time (time) object 1850-01-16 12:00:00 ... 2014-12-16 12:00:00

This is a clunky function (though it so far appears to be working on CMIP data...). I am absolutely sure you developers can come up with a better method for interpreting the hybrid formula on the fly, but I thought I'd put it here in case it is useful. I also recognize using exec/eval is not a best practice...maybe this can be avoided somehow (I have ideas if anyone ends up working on this).

def interprete_formula(f):
    operator_chars = ['*', '(', ')', '+', '-', '/']
    # first find/remove all parenthesis that are specifying axes
    # p(n,k,j,i) = ap(k) + b(k)*ps(n,j,i) --> p = ap + b*ps
    # get formula indices corresponding to a left hand parenthesis (
    pinds = [i for i in range(len(f)) if f[i] == '(']
    # initialize list to replace sections of formula (e.g., '(i,j,k)' --> '')
    replace = []
    # loop over left hand parenthesis instances
    for p in pinds:
        pref = ''  # initialize parenthesis string
        stopsearch = False  # set stop flag to false
        # loop from left hand parentheis to end of formula
        for i in range(p, len(f)):
            # if stopsearch, continue
            if stopsearch:
                continue
            # get indexed character
            c = f[i]
            # check for close parenthesis )
            # if closed, stopsearch
            if c == ')':
                pref += c # add character to parenthesis string
                stopsearch = True
            else:
                pref += c # add character to parenthesis string
        # check for arithmetic operators inside parenthesis string
        ocs = [c for c in pref[1:-1] if c in operator_chars]
        # if no arithmetic operators, queue parenthesis string for removal
        if len(ocs) == 0:
            replace.append(pref)
    # remove all parenthesis strings with no arithmetic operators
    for r in replace:
        f = f.replace(r, '')

    # further process formula
    fout = ''  # initialize blank formula
    vid = ''  # initialize blank variable
    lhs = f.split('=')[0] # get lhs
    f = f.split('=')[1] # get rhs
    # loop over each char
    for s in f:
        # ignore blank spaces
        if s == ' ':
            continue
        # if arithmetic operator, need to add to formula
        if s in operator_chars:
            # if a variable is not empty, write it out first
            if len(vid) > 0:
                fout = fout + 'ds["' + vid + '"]'
                vid = ''  # reset variable accumulator
            # write out arithmetic operator
            fout = fout + s
        else:
            # char must be part of a variable - add to variable accumulator
            vid += s
    # finished looping over all characters
    # write out variable to formula
    fout = fout + 'ds["' + vid + '"]'
    # add lhs
    fout = lhs + ' = ' + fout
    # return formula
    return fout

pochedls avatar Aug 03 '23 04:08 pochedls

@pochedls @tomvothecoder Created PR in the cf_xarray repo https://github.com/xarray-contrib/cf-xarray/pull/528.

jasonb5 avatar Jul 18 '24 17:07 jasonb5

I think I have this started correctly. @jasonb5 – would decode_vertical_coords have worked for atmosphere_hybrid_sigma_pressure_coordinate before this PR (it is working now, which I assume is because of your PR)?

I pulled your PR (in my normal xcdat environment) import cf-xarray and cf.__path__ shows that I'm pointing to your PR). I then import xcdat (which I believe retains the updated cf-xarray import...).

Syntax for testers (i.e., note to self):

fn = '/p/css03/esgf_publish/CMIP6/CMIP/NASA-GISS/GISS-E3-G/amip/r1i1p3f1/CFmon/ta/gr/v20220707/ta_CFmon_GISS-E3-G_amip_r1i1p3f1_gr_197801-201412.nc'
ds = xc.open_mfdataset(fn)
ds.cf.decode_vertical_coords(outnames={'lev': 'plev'})
ds.plev

pochedls avatar Jul 22 '24 17:07 pochedls

Summary

I made some progress on this. The main finding is that models with the formula p = ap + b*ps fail to decode (sometimes with an error and sometimes silently). I think this results because under CF conventions, vertical coordinates with the standard_name : atmosphere_hybrid_sigma_pressure_coordinate have two acceptable formulas (with different fields needed): p = ap + b*ps and p = a*p0 + b*ps.

I've broken the problem into two main chunks ("Decode Error" and "Silent Failures"):

Decode Error

The following models produce an error (Required terms p0 are absent in the dataset.): IPSL* / Can* / MPI* / EC-Earth3-AerChem. These models have standard_name : atmosphere_hybrid_sigma_pressure_coordinate with formula p = ap + b*ps.

UKESM* and HadGEM3-GC31-LL produce KeyError: 'standard_name' and have standard_name : atmosphere_hybrid_height_coordinate with formula z = a + b*orog.

Silent failures

A number of models silently fail (i.e., ds.cf.decode_vertical_coords(outnames={'lev': 'plev'}) executes, but does not produce a plev DataArray): CNRM-CM6-1, CNRM-ESM2-1, GFDL-CM4, IPSL-CM6A-MR1. These datasets have standard_name : atmosphere_hybrid_sigma_pressure_coordinate and formula p = ap + b*ps. They don't have p0 in the dataset, but for some reason don't produce the error as above.

CESM2-WACCM and CESM2 also silently fail with standard_name : alevel (and no listed formula).

IPSL-CM6A-LR also silently fails. It has vertical coordinate presnivs with standard name : 'Vertical levels' and no formula.

Dataset Listings

Datasets with no apparent problem

CNRM-CM5: /p/css03/cmip5_css01/data/cmip5/output1/CNRM-CERFACS/CNRM-CM5/amip/mon/atmos/cfMon/r1i1p1/v20111006/ta/ bcc-csm1-1-m: /p/css03/cmip5_css01/data/cmip5/output1/BCC/bcc-csm1-1-m/amip/mon/atmos/cfMon/r1i1p1/v20120910/ta/ GFDL-CM3: /p/css03/esgf_publish/cmip5/output1/NOAA-GFDL/GFDL-CM3/amip/mon/atmos/cfMon/r1i1p1/v20110601/ta/ CESM1-CAM5: /p/css03/cmip5_css01/data/cmip5/output1/NSF-DOE-NCAR/CESM1-CAM5/amip/mon/atmos/cfMon/r2i1p1/v20121129/ta/ HadGEM2-A: /p/css03/cmip5_css02/data/cmip5/output1/MOHC/HadGEM2-A/amip/mon/atmos/cfMon/r1i1p1/ta/1/ inmcm4: /p/css03/cmip5_css02/data/cmip5/output1/INM/inmcm4/amip/mon/atmos/cfMon/r1i1p1/ta/1/ MIROC5: /p/css03/cmip5_css02/data/cmip5/output1/MIROC/MIROC5/amip/mon/atmos/cfMon/r1i1p1/ta/1/ MRI-CGCM3: /p/css03/esgf_publish/cmip5/output1/MRI/MRI-CGCM3/amip/mon/atmos/cfMon/r1i1p1/v20131011/ta/ CCSM4: /p/css03/esgf_publish/cmip5/gdo2/cmip5/output1/NCAR/CCSM4/amip/mon/atmos/cfMon/r7i1p1/v20121031/ta/ INM-CM5-0: /p/css03/esgf_publish/CMIP6/CMIP/INM/INM-CM5-0/amip/r1i1p1f1/CFmon/ta/gr1/v20190610/ INM-CM4-8: /p/css03/esgf_publish/CMIP6/CMIP/INM/INM-CM4-8/amip/r1i1p1f1/CFmon/ta/gr1/v20190528/ TaiESM1: /p/css03/esgf_publish/CMIP6/CMIP/AS-RCEC/TaiESM1/amip/r1i1p1f1/CFmon/ta/gn/v20200817/ MRI-ESM2-0: /p/css03/esgf_publish/CMIP6/CMIP/MRI/MRI-ESM2-0/amip/r2i1p1f1/CFmon/ta/gn/v20190722/ MIROC-ES2L: /p/css03/esgf_publish/CMIP6/CMIP/MIROC/MIROC-ES2L/amip/r3i1p1f2/CFmon/ta/gn/v20200903/ MIROC6: /p/css03/esgf_publish/CMIP6/CMIP/MIROC/MIROC6/amip/r2i1p1f1/CFmon/ta/gn/v20190311/ NorESM2-LM: /p/css03/esgf_publish/CMIP6/CMIP/NCC/NorESM2-LM/amip/r1i1p2f1/CFmon/ta/gn/v20210319/ BCC-CSM2-MR: /p/css03/esgf_publish/CMIP6/CFMIP/BCC/BCC-CSM2-MR/amip/r1i1p1f1/CFmon/ta/gn/v20190801/

Dataset with an error on ds.cf.decode_vertical_coords(outnames={'lev': 'plev'})

IPSL-CM5B-LR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5B-LR/amip/mon/atmos/cfMon/r1i1p1/v20120526/ta/ IPSL-CM5A-MR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5A-MR/amip/mon/atmos/cfMon/r1i1p1/v20120804/ta/ IPSL-CM5A-LR: /p/css03/cmip5_css01/data/cmip5/output1/IPSL/IPSL-CM5A-LR/amip/mon/atmos/cfMon/r1i1p1/v20111119/ta/ MPI-ESM-LR: /p/css03/esgf_publish/cmip5/output1/MPI-M/MPI-ESM-LR/amip/mon/atmos/cfMon/r1i1p1/v20120215/ta/ CanAM4: /p/css03/cmip5_css02/data/cmip5/output1/CCCma/CanAM4/amip/mon/atmos/cfMon/r1i1p1/ta/1/ UKESM1-0-LL: /p/css03/esgf_publish/CMIP6/CMIP/MOHC/UKESM1-0-LL/amip/r1i1p1f4/CFmon/ta/gn/v20210817/ HadGEM3-GC31-LL: /p/css03/esgf_publish/CMIP6/CMIP/MOHC/HadGEM3-GC31-LL/amip/r5i1p1f3/CFmon/ta/gn/v20210427/ CanESM5: /p/css03/esgf_publish/CMIP6/CMIP/CCCma/CanESM5/amip/r2i1p1f1/CFmon/ta/gn/v20190429/ MPI-ESM-1-2-HAM: /p/css03/esgf_publish/CMIP6/CMIP/HAMMOZ-Consortium/MPI-ESM-1-2-HAM/amip/r2i1p1f1/CFmon/ta/gn/v20190628/ EC-Earth3-AerChem: /p/css03/esgf_publish/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-AerChem/amip/r1i1p1f1/CFmon/ta/gr/v20200910/ MPI-ESM1-2-LR: /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/amip/r2i1p1f1/CFmon/ta/gn/v20190815/ MPI-ESM1-2-HR: /p/css03/esgf_publish/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/amip/r2i1p1f1/CFmon/ta/gn/v20190710/

Datasets with no error, but do not produce a plev field

CNRM-CM6-1: /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/amip/r1i1p1f2/CFmon/ta/gr/v20181203/ CNRM-ESM2-1: /p/css03/esgf_publish/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/amip/r1i1p1f2/CFmon/ta/gr/v20181205/ CESM2-WACCM: /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2-WACCM/amip/r2i1p1f1/CFmon/ta/gn/v20190220/ CESM2: /p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/amip/r1i1p1f1/CFmon/ta/gn/v20190218/ GFDL-CM4: /p/css03/esgf_publish/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/amip/r1i1p1f1/CFmon/ta/gr1/v20180701/ IPSL-CM6A-MR1: /p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM6A-MR1/amip/r1i1p1f1/CFmon/ta/gr/v20230220/ IPSL-CM6A-LR: /p/css03/esgf_publish/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/amip/r2i1p1f1/CFmon/ta/gr/v20181109/

pochedls avatar Jul 22 '24 21:07 pochedls

Now that is amazing.

Given

The choice of whether a(k) or ap(k) is used depends on model formulation; the former is a dimensionless fraction, the latter a pressure value.

perhaps we can check units to decide which representation to use.

dcherian avatar Jul 22 '24 22:07 dcherian

FYI – I had focused on testing atmospheric vertical coordinates. Ideally we could get some help testing ocean coordinates (@durack1?); otherwise, this might take awhile to stress test. Although I think CMIP data I've used is mostly on depth coordinates: are there CMIP ocean variables that are typically on coordinate systems that would need to be "decoded" (ocean sigma, s-coordinate, etc.)?

pochedls avatar Jul 22 '24 22:07 pochedls