CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Validate the CMOR output of CTSM for CMIP7

Open ekluzek opened this issue 3 months ago • 43 comments

This comes from @jedwards4b in a discussion on TSS Software. He has a start on a formatter that translates time series data into CMOR format to be used for the CMIP7 simulations. This has the data names and metadata standardized across CMIP7 submissions.

Information on the CMOR standard is here:

https://cmor.llnl.gov

The most pressing thing it seems right now is to figure out how we inform CMOR variables about how they should be area averaged. Right now we don't do area averaging on variables, but some may need this for them to be used in CMIP7.

ekluzek avatar Sep 12 '25 19:09 ekluzek

Jim noted: I have converted my first land variable to cmor format, the result is here: /glade/derecho/scratch/cmip7/CMIP6/CMIP/NCAR/CESM2/esm-piControl-spinup/r1i1p1f1/Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_esm-piControl-spinup_r1i1p1f1_000101-001012.nc

The original timeseries is: /glade/derecho/scratch/cmip7/archive/timeseries/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1_32/lnd/hist/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1.clm2.h0.QSOIL.000101-001012.nc

Please have a look at this file and let me know if anything is missing or you have any questions or concerns.


Here's an example dataset that @jedwards4b made for us to evaluate: (NOTE CMIP6 will change to CMIP7 and CLM5 will change to CTSM6 in the final version)

ncdump -h /glade/derecho/scratch/cmip7/CMIP6/CMIP/NCAR/CESM2/esm-piControl-spinup/r1i1p1f1/Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_esm-piControl-spinup_r1i1p1f1_000101-001012.nc
netcdf evspsblsoi_Lmon_CESM2_esm-piControl-spinup_r1i1p1f1_000101-001012 {
dimensions:
        time = UNLIMITED ; // (120 currently)
        lat = 180 ;
        lon = 360 ;
        bnds = 2 ;
variables:
        double time(time) ;
                time:bounds = "time_bnds" ;
                time:units = "days since 0001-01-01 00:00:00" ;
                time:calendar = "noleap" ;
                time:axis = "T" ;
                time:long_name = "time" ;
                time:standard_name = "time" ;
        double time_bnds(time, bnds) ;
        double lat(lat) ;
                lat:bounds = "lat_bnds" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
                lat:long_name = "Latitude" ;
                lat:standard_name = "latitude" ;
        double lat_bnds(lat, bnds) ;
        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;
        double lon_bnds(lon, bnds) ;
        float evspsblsoi(time, lat, lon) ;
                evspsblsoi:standard_name = "water_evaporation_flux_from_soil" ;
                evspsblsoi:long_name = "Water Evaporation from Soil" ;
                evspsblsoi:comment = "Water evaporation from soil (including sublimation)." ;
                evspsblsoi:units = "kg m-2 s-1" ;
                evspsblsoi:cell_methods = "area: mean where land time: mean" ;
                evspsblsoi:cell_measures = "area: areacella" ;
                evspsblsoi:missing_value = 1.e+20f ;
                evspsblsoi:_FillValue = 1.e+20f ;
                evspsblsoi:history = "2025-09-08T14:50:29Z altered by CMOR: Converted type from \'d\' to \'f\'." ;

// global attributes:
                :Conventions = "CF-1.7 CMIP-6.2" ;
                :activity_id = "CMIP" ;
                :branch_method = "no parent" ;
                :branch_time_in_child = 0. ;
                :branch_time_in_parent = 0. ;
                :creation_date = "2025-09-08T14:50:29Z" ;
                :data_specs_version = "01.00.33" ;
                :experiment = "pre-industrial control simulation with CO2 concentration calculated (spin-up)" ;
                :experiment_id = "esm-piControl-spinup" ;
                :external_variables = "areacella" ;
                :forcing_index = 1 ;
                :frequency = "mon" ;
                :further_info_url = "https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.esm-piControl-spinup.none.r1i1p1f1" ;
                :grid = "data on a finite volume 1x1 degree grid interpolated from the native spectral element grid." ;
                :grid_label = "gr" ;
                :history = "2025-09-08T14:50:29Z ;rewrote data to be consistent with CMIP for variable evspsblsoi found in table Lmon." ;
                :initialization_index = 1 ;
                :institution = "National Center for Atmospheric Research, Climate and Global Dynamics Laboratory, 1850 Table Mesa Drive, Boulder, CO 80305, USA" ;
                :institution_id = "NCAR" ;
                :mip_era = "CMIP6" ;
                :nominal_resolution = "100 km" ;
                :parent_experiment_id = "no parent" ;
                :parent_mip_era = "no parent" ;
                :parent_source_id = "no parent" ;
                :parent_time_units = "no parent" ;
                :physics_index = 1 ;
                :product = "model-output" ;
                :project\ PI = "Stephen Yeager" ;
                :realization_index = 1 ;
                :realm = "land" ;
                :references = "https://ncar.ucar.edu/what-we-offer/models/community-earth-system-model-cesm" ;
                :source = "CESM2 (2018): \n",
                        "aerosol: MAM4 (same grid as atmos)\n",
                        "atmos: CAM6 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 32 levels; top level 2.25 mb)\n",
                        "atmosChem: MAM4 (same grid as atmos)\n",
                        "land: CLM5 (same grid as atmos)\n",
                        "landIce: CISM2.1\n",
                        "ocean: POP2 (320x384 longitude/latitude; 60 levels; top grid cell 0-10 m)\n",
                        "ocnBgchem: MARBL (same grid as ocean)\n",
                        "seaIce: CICE5.1 (same grid as ocean)" ;
                :source_id = "CESM2" ;
                :source_type = "AOGCM BGC" ;
                :sub_experiment = "none" ;
                :sub_experiment_id = "none" ;
                :table_id = "Lmon" ;
                :table_info = "Creation Date:(18 November 2020) MD5:1bc130d8458c2f5cf7e45c979681c644" ;
                :title = "CESM2 output prepared for CMIP6" ;
                :tracking_id = "hdl:21.14100/bb0daaab-4db7-4dd0-a86a-0b5f91071e9f" ;
                :variable_id = "evspsblsoi" ;
                :variant_label = "/glade/derecho/scratch/cmip7/CMIP6/CMIP/NCAR/CESM2/esm-piControl-spinup/r1i1p1f1/Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_esm-piControl-spinup_r1i1p1f1" ;
                :license = "CMIP6 model data produced by the National Center for Atmospheric Research is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[ and at https://www.cesm.ucar.edu/working-groups/earth-system]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
                :cmor_version = "3.11.0" ;
}

ekluzek avatar Sep 12 '25 19:09 ekluzek

Pinging: @wwieder @olyson @slevis-lmwg

ekluzek avatar Sep 12 '25 19:09 ekluzek

I'm a little unclear about the statement about area averaging in the subject of this issue, Erik. But in an email to @jedwards4b I did request information about area and landfrac on the regridded 1 degree output to help me evaluate results here. Typically these are provided as separate files in the cmorized data.

wwieder avatar Sep 15 '25 14:09 wwieder

From an email with @jedwards4b , Using weights file /glade/campaign/cesm/cesmdata/inputdata/cpl/gridmaps/ne30pg3/map_ne30pg3_to_1x1d_aave.nc And a bilinear technique from esmf. The code is in /glade/work/cmip7/e3sm_to_cmip/e3sm_to_cmip/cmor_handlers/regrid_se.py

wwieder avatar Sep 17 '25 16:09 wwieder

Also my response over email:

Great thanks Jim.

A few thoughts.
I got better results using the conservative regridding approach for land variables. I also found that I needed to use source and destination grid land masks to get the weights to behave nicely. If it's helpful, see this notebook /glade/u/home/wwieder/python/adf/scripts/regridding/regrid_conservative.ipynb This approach does end up producing "fuzzy" coastlines, but global and regional fluxes / states seem better preserved, and users can mask out coasts to get nice looking maps.

I'm including @billsacks here too, as he was helpful in thinking about how to do this.

wwieder avatar Sep 17 '25 16:09 wwieder

From conversation with @olyson and @slevis-lmwg today:

  • Keith will try using ncremap to do this too:
  • Will can try his script and compare all of the above
  • Jim I would still need information about landfrac and area on the destination grid

wwieder avatar Sep 17 '25 16:09 wwieder

Regridded QSOIL file using ncremap with Jim's mapping file: /glade/derecho/scratch/oleson/ANALYSIS/b.e30_beta06.B1850C_LTso.1deg.clm2.h0.evspsblsoi.000101-001012.nc

olyson avatar Sep 17 '25 16:09 olyson

I think @olyson used this weight file /glade/campaign/cesm/cesmdata/inputdata/cpl/gridmaps/ne30pg3/map_ne30pg3_to_1x1d_aave.nc.

It looks like this file uses ESMF_regrid_method = "First-order Conservative", with means that ncreamap would also be conservative? It does seem like there are differences along the coasts that would be good to understand?

Image

wwieder avatar Sep 17 '25 17:09 wwieder

This is the command I used:

ncremap -t 1 -P clm --sgs_frc=landfrac --sgs_msk=landmask -m /glade/campaign/cesm/cesmdata/inputdata/cpl/gridmaps/ne30pg3/map_ne30pg3_to_1x1d_aave.nc /glade/derecho/scratch/cmip7/archive/timeseries/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1_32/lnd/hist/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1.clm2.h0.QSOIL.000101-001012.nc b.e30_beta06.B1850C_LTso.1deg.clm2.h0.QSOIL.000101-001012.nc

Info from the nco user's guide regarding the -P clm --sgs_frc=landfrac --sgs_msk=landmask options:

"-P prc_typ" ‘elm’ or ‘clm’ for DOE ELM and NCAR CLM model data. The ELM, CLM, and CICE procedures set idiosyncratic model values and then invoke the Sub-gridscale (SGS) procedure (see below). SGS-mode: ncremap has a sub-gridscale (SGS) mode that performs the special pre-processing and weighting necessary to to conserve fields that represent fractional spatial portions of a gridcell, and/or fractional temporal periods of the analysis. Spatial fields output by most geophysical models are intensive, and so by default the regridder attempts to conserve the integral of the area times the field value such that the integral is equal on source and destination grids. However some models (like ELM, CLM, CICE, and MPAS-Seaice) output gridcell values intended to apply to only a fraction sgs_frc (for “sub-gridscale fraction”) of the gridcell. The sub-gridscale (SGS) fraction usually changes spatially with the distribution of land and ocean, and spatiotemporally with the distribution of sea ice and possibly vegetation. For concreteness consider a sub-grid field that represents the land fraction. Land fraction is less than one in gridcells that resolve coastlines or islands. ELM and CLM happily output temperature values valid only for a small (i.e., sgs_frc << 1) island within the larger gridcell. Model architecture dictates this behavior and savvy researchers expect it. The goal of the NCO weight-application algorithm is to treat SGS fields as seamlessly as possible so that those less familiar with sub-gridscale models can easily regrid them correctly.

Fortunately, models like ELM and CLM that run on the same horizontal grid as the overlying atmosphere can use the same mapping-file as the atmosphere, so long as the SGS weight-application procedure is invoked. Not invoking an SGS-aware weight application algorithm is equivalent to assuming sgs_frc = 1 everywhere. Regridding sub-grid values correctly versus incorrectly (e.g., with and without SGS-mode) alters global-mean answers for land-based quantities by about 1% for horizontal grid resolutions of about one degree. The resulting biases are in intricately shaped regions (coasts, lakes, sea-ice floes) and so are easy to overlook.

To invoke SGS mode and correctly regrid sub-gridscale data, specify the names of the fractional area sgs_frc and, if applicable, the mask variable sgs_msk (strictly, this is only necessary if these names differ from their respective defaults landfrac and landmask).

The area and sgs_frc fields in the regridded file will be in units of sterradians and fraction, respectively. However, ncremap offers custom options to reproduce the idiosyncratic data and metadata format of two particular models, ELM and CICE. When invoked with ‘-P elm’ (or ‘-P clm’), a final step converts the output area from sterradians to square kilometers.

olyson avatar Sep 17 '25 17:09 olyson

@wwieder @olyson I am using xesmf for regridding. I don't see the subgrid commands in that tool. Am I missing something? Am I forced to use ncremap?

jedwards4b avatar Sep 17 '25 17:09 jedwards4b

I think that I see what to do, I'll get back to you. Thanks

jedwards4b avatar Sep 17 '25 18:09 jedwards4b

Here is the latest attempt, I realize that the auxiliary variables are missing again.
/glade/derecho/scratch/cmip7/CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc

jedwards4b avatar Oct 03 '25 20:10 jedwards4b

I now have the landfrac in /glade/derecho/scratch/cmip7/CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/fx/sftlf/gr/sftlf_fx_CESM2_piControl_r1i1p1f1.nc

jedwards4b avatar Oct 06 '25 15:10 jedwards4b

Thanks, I'll take a look this week.

wwieder avatar Oct 06 '25 15:10 wwieder

Now added areacella: /glade/derecho/scratch/cmip7/CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/fx/areacella/gr/areacella_fx_CESM2_piControl_r1i1p1f1.nc

jedwards4b avatar Oct 06 '25 18:10 jedwards4b

Hey Jim. Global quantities seem poorly conserved with the regridding. Below is the global weighed sum of the flux from the raw data (ne30 in blue) vs. the regridded flux (orange). This is kind of trick to diagnose, but I'll try some things.

Image

wwieder avatar Oct 17 '25 15:10 wwieder

OK, I think the areas on the regular grid are likely wonky. I wouldn't expect this on a regular grid, would you?

Image

wwieder avatar Oct 17 '25 15:10 wwieder

Hi Will - thanks for having a look. Please make suggestions for improvement. The code repo is https://github.com/ESMCI/cmip7-prep/. I've done a lot of work on it since producing that file, though nothing that should make any changes to the outcome. I should be able to produce a full set of Lmon output again soon.

jedwards4b avatar Oct 17 '25 16:10 jedwards4b

It seems like the script must just be regridding the ne30 area to the regular grid, which doesn't seem to be working as intended, but your have a way to calculate the gridcell area on a regular grid with compute_areacella_from_bounds. I wonder if you get rid of the if statement in line 822 the assigned areas will look as expected?

That said, I'm also concerned that the script seems to just be regridding landfrac as the variables are being done. This also seems problematic because Sean and Bill were both clear that to get conservative fluxes you needed to know landfrac on both source and destination grids. Calculating the desitination grid land frac on the fly like this seems potentially problematic. Is there a target 1 degree grid and land mask that you can use for your destination grid here, instead of calculating it on the fly?

@billsacks do you have suggestions here?

wwieder avatar Oct 17 '25 17:10 wwieder

@wwieder (and @jedwards4b ) - I agree that we want areas on the destination grid to be coming from an actual calculation of those areas, e.g., from ESMF: I think that a remapping of areas from source to destination will generally not be what you want.

I'm less clear on what's right about landfrac. Digging up emails from January, it looks like then I was suggesting regridding landfrac from source to destination. I'd need to get my head back into this. Maybe easiest to talk through.... @wwieder , since you and I spent so long on getting something right last winter, my first inclination would be to trust whatever method you settled on and we looked at together then.... But maybe we can talk more about this? If it can wait until next week, that would be best for me... my schedule is pretty open next week.

billsacks avatar Oct 17 '25 18:10 billsacks

Thanks for looking into this, I am working with Mariana on a CMEPS PR this afternoon, but will address this early next week.

jedwards4b avatar Oct 17 '25 18:10 jedwards4b

Thanks for jogging my memory, @billsacks. I think you're right we did regrid landfrac from the source to destination grid, as @jedwards4b is doing in his script. I think the only issue is then with how area's being handled (https://github.com/ESCOMP/CTSM/issues/3485#issuecomment-3416529091).

I think there is an additional complication of having to normalize fluxes by the landfrac on both source and destination grids, which I didn't see in your script either, Jim. See this notebook, see steps in cell_#2. Hopefully this gets us closer!

wwieder avatar Oct 17 '25 21:10 wwieder

@wwieder thank you for the suggestions. Please have a look at

/glade/derecho/scratch/jedwards/
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/fx/areacella/gr/areacella_fx_CESM2_piControl_r1i1p1f1.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/fx/sftlf/gr/sftlf_fx_CESM2_piControl_r1i1p1f1.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/evspsblveg/gr/evspsblveg_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/mrfso/gr/mrfso_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/mrso/gr/mrso_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/mrro/gr/mrro_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/lai/gr/lai_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/mrsos/gr/mrsos_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc
CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/Lmon/mrros/gr/mrros_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc

let me know if I'm any closer.

jedwards4b avatar Oct 20 '25 15:10 jedwards4b

Good news: area looks more like I expected

Image

Bad news: global quantities are still not being conserved

Image

It seems like you're multiplying history variables by the source (ne30) land frac in your script and then dividing the regridded variable by the destination grid (1 deg) land frac here.

The only difference I can see in my notebook is that I set missing values for history variables and landfrac to zero before regridding. Maybe this influences how quantities are being regridded?

# Fill in  missing values with zeros
ds_con['GPP'] = ds_con['GPP'].fillna(0) 
ds_con['landfrac']= ds_con['landfrac'].fillna(0)  
ds_con['GPP'] = ds_con.GPP * ds_con.landfrac # weight flux by land frac

Otherwise, I'm assuming you're using a weights file that uses a conservative method and then using conservative mapping in your regirdder (but I'm not totally following what **kwargs are being passed in your regridding step for land variables?

wwieder avatar Oct 21 '25 14:10 wwieder

kwargs is only used to set output_chunks which is a dask decomposition method. Looking at my code it seems like I am explicitly handling NaN values correctly, but I'll give it another try after resetting them to 0. Thanks

jedwards4b avatar Oct 21 '25 14:10 jedwards4b

@wwieder I have taken your suggestion to replace nan with 0 prior to normalizing. Please check again.

jedwards4b avatar Oct 21 '25 19:10 jedwards4b

Looking better! I can dig more into this, but it may not be until Friday at the earliest. Is this OK Jim?

Image

wwieder avatar Oct 21 '25 20:10 wwieder

@wwieder thanks - I can take another stab at it this week if the script you are using to make that plot is easy, I can run it myself? Otherwise I'm out next week so it'll be November before we get back to it.

jedwards4b avatar Oct 21 '25 20:10 jedwards4b

my notebook is here /glade/u/home/wwieder/python/ctsm_py/notebooks/Compare_grids.ipynb

you'll need a conda environment with uxarray to make it work I'm only running the first 3 cells to make this global annual timeseries.

I'd like to work on looking at regional plots, as coastlines can look odd with the regridded data, but this is a WIP.

wwieder avatar Oct 21 '25 22:10 wwieder

if you don't want to look at a notebook, this is all I'm doing:

import xarray as xr
import uxarray as ux
import numpy as np
import pandas as pd

# some resources for plotting
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.path as mpath
import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import os,sys
from glob import glob
from os.path import join
%matplotlib inline

## read in raw data, ne30 grid
raw_data = '/glade/derecho/scratch/cmip7/archive/timeseries/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1/lnd/hist/b.e30_beta06.B1850C_LTso.ne30_t232_wgx3.192.wrkflw.1.clm2.h0.QSOIL.000101-001012.nc'
mesh = '/glade/campaign/cesm/cesmdata/inputdata/share/meshes/ne30pg3_ESMFmesh_cdf5_c20211018.nc'
ux_ds = ux.open_dataset(mesh, raw_data)

## read in regridded data
regrid_path = '/glade/derecho/scratch/jedwards/CMIP7/CMIP6/CMIP/NCAR/CESM2/piControl/r1i1p1f1/'
xr_ds = 'Lmon/evspsblsoi/gr/evspsblsoi_Lmon_CESM2_piControl_r1i1p1f1_000101-001012.nc'
xr_landfrac = '/fx/sftlf/gr/sftlf_fx_CESM2_piControl_r1i1p1f1.nc'
xr_area = 'fx/areacella/gr/areacella_fx_CESM2_piControl_r1i1p1f1.nc'
print(regrid_path+xr_area)
xr_ds = xr.open_dataset(regrid_path+xr_ds, decode_times=True)
xr_landfrac = xr.open_dataset(regrid_path+xr_landfrac, decode_times=True)
xr_area = xr.open_dataset(regrid_path+xr_area, decode_times=True)

# Plot area weighted sums
ux_ds['wgt'] = (ux_ds.area * ux_ds.landfrac)/(ux_ds.area * ux_ds.landfrac).sum()
plt.plot((ux_ds.QSOIL * ux_ds.wgt).sum('n_face').values, label='ne30') 

xr_ds['wgt'] = (xr_area.areacella * xr_landfrac.sftlf)/(xr_area.areacella * xr_landfrac.sftlf).sum()
plt.plot((xr_ds.evspsblsoi * xr_ds.wgt).sum(['lat','lon']).values, label='regridded') 
ylab = str(xr_ds.evspsblsoi.attrs['long_name'])
plt.ylabel = ylab
plt.legend()

wwieder avatar Oct 21 '25 22:10 wwieder