E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

EAMxx: Suggested CF Metadata Improvements

Open czender opened this issue 7 months ago • 0 comments

@crterai @bartgol @chengzhuzhang @rljacob As requested by Jill, here are some metadata suggestions that would bring EAMxx closer to or beyond parity with EAM in terms of CF-compliance. This is not a ranked list.

  1. Add a CF "coordinates" attribute to each variable with an ncol dimension so the metadata looks like this:
float wind_speed_10m(time,ncol) ;
    wind_speed_10m:units = "m s^-1" ;
    wind_speed_10m:long_name = "wind_speed_10m" ;
    wind_speed_10m:coordinates = "lat lon" ;

This is the recommend way to indicate which variables define the horizontal location in unstructured datasets because lat(ncol) and lon(ncol) are (in CF jargon) "auxiliary coordinate variables", not true 1-D coordinate variables whose names match their single dimension. EAM and MPAS should also do this, because it would make it easier for post-processing tools to figure out what variables specify the horizontal location. CF discourages guessing what variables (e.g., lat and lon) mean are based simply on their names.

  1. The monthly mean EAMxx outputs I have seen have pretty good metadata for the most commonly used variables. For the less commonly used variables the standard_name attribute is often "MISSING", and sometimes the long_name attribute is also "MISSING". Maybe the standard_name is hard to find for some variables, or EAMxx haven't prioritized populating them all, so that's understandable. However, a variable's long_name can be anything that makes sense. The source code comment where the variable is first defined or used often makes a fine long_name. It's especially important for coordinate variables and for auxiliary coordinate variables to have standard_name defined because many tools use that to find the coordinates. To help focus this issue, here are some examples where standard_name should be defined (not "MISSING"):
zender@login39:~$ ncks -m -C -v lat,lon,snow_depth_land /global/cfs/cdirs/e3sm/beydoun/ne256pg2_ne256pg2.F20TR-SCREAMv1.rainfrac1.spanc1000.auto2700.acc150.n0128/run/1da.AVERAGE.ndays_x1.1995-01-01-00000.nc # Daily average
netcdf 1da.AVERAGE.ndays_x1.1995-01-01-00000 {
  dimensions:
    ncol = 1572864 ;
    time = UNLIMITED ; // (1 currently)

  variables:
    float lat(ncol) ;
      lat:units = "rad" ;
      lat:_FillValue = 3.402824e+33f ;
      lat:long_name = "MISSING" ;
      lat:standard_name = "MISSING" ;

    float lon(ncol) ;
      lon:units = "rad" ;
      lon:_FillValue = 3.402824e+33f ;
      lon:long_name = "MISSING" ;
      lon:standard_name = "MISSING" ;

    float snow_depth_land(time,ncol) ;
      snow_depth_land:units = "m" ;
      snow_depth_land:_FillValue = 3.402824e+33f ;
      snow_depth_land:long_name = "MISSING" ;
      snow_depth_land:standard_name = "MISSING" ;
} // group /
zender@login39:~$ 
  1. There is a long discussion on whether single timestep output should be considered as instantaneous or an average over the timestep, and therefore whether it makes sense to have a time bounds variable (usually time_bnds) in those files. Leaving aside the "instantaneous" issue for simplicity, CF recommends that the variables that do contain temporal averages also have the cell_methods attribute to show that they are averages and not instantaneous values in the time dimension. That would entail adding an attribute like this to all the time-varying variables in the hourly-mean, daily-mean, and monthly-mean (etc.) files:
variable_name:cell_methods = "time: mean" ;

EAMv3 does this for all time-mean files, so EAMxx would be at parity with EAM if it also adopted this CF convention.

That's all I got for now. Feedback welcome!

czender avatar Jun 04 '25 23:06 czender