EAMxx: Suggested CF Metadata Improvements
@crterai @bartgol @chengzhuzhang @rljacob As requested by Jill, here are some metadata suggestions that would bring EAMxx closer to or beyond parity with EAM in terms of CF-compliance. This is not a ranked list.
- Add a CF "coordinates" attribute to each variable with an ncol dimension so the metadata looks like this:
float wind_speed_10m(time,ncol) ;
wind_speed_10m:units = "m s^-1" ;
wind_speed_10m:long_name = "wind_speed_10m" ;
wind_speed_10m:coordinates = "lat lon" ;
This is the recommend way to indicate which variables define the horizontal location in unstructured datasets because lat(ncol) and lon(ncol) are (in CF jargon) "auxiliary coordinate variables", not true 1-D coordinate variables whose names match their single dimension. EAM and MPAS should also do this, because it would make it easier for post-processing tools to figure out what variables specify the horizontal location. CF discourages guessing what variables (e.g., lat and lon) mean are based simply on their names.
- The monthly mean EAMxx outputs I have seen have pretty good metadata for the most commonly used variables. For the less commonly used variables the
standard_nameattribute is often "MISSING", and sometimes thelong_nameattribute is also "MISSING". Maybe the standard_name is hard to find for some variables, or EAMxx haven't prioritized populating them all, so that's understandable. However, a variable'slong_namecan be anything that makes sense. The source code comment where the variable is first defined or used often makes a finelong_name. It's especially important for coordinate variables and for auxiliary coordinate variables to havestandard_namedefined because many tools use that to find the coordinates. To help focus this issue, here are some examples wherestandard_nameshould be defined (not "MISSING"):
zender@login39:~$ ncks -m -C -v lat,lon,snow_depth_land /global/cfs/cdirs/e3sm/beydoun/ne256pg2_ne256pg2.F20TR-SCREAMv1.rainfrac1.spanc1000.auto2700.acc150.n0128/run/1da.AVERAGE.ndays_x1.1995-01-01-00000.nc # Daily average
netcdf 1da.AVERAGE.ndays_x1.1995-01-01-00000 {
dimensions:
ncol = 1572864 ;
time = UNLIMITED ; // (1 currently)
variables:
float lat(ncol) ;
lat:units = "rad" ;
lat:_FillValue = 3.402824e+33f ;
lat:long_name = "MISSING" ;
lat:standard_name = "MISSING" ;
float lon(ncol) ;
lon:units = "rad" ;
lon:_FillValue = 3.402824e+33f ;
lon:long_name = "MISSING" ;
lon:standard_name = "MISSING" ;
float snow_depth_land(time,ncol) ;
snow_depth_land:units = "m" ;
snow_depth_land:_FillValue = 3.402824e+33f ;
snow_depth_land:long_name = "MISSING" ;
snow_depth_land:standard_name = "MISSING" ;
} // group /
zender@login39:~$
- There is a long discussion on whether single timestep output should be considered as instantaneous or an average over the timestep, and therefore whether it makes sense to have a time bounds variable (usually
time_bnds) in those files. Leaving aside the "instantaneous" issue for simplicity, CF recommends that the variables that do contain temporal averages also have thecell_methodsattribute to show that they are averages and not instantaneous values in the time dimension. That would entail adding an attribute like this to all the time-varying variables in the hourly-mean, daily-mean, and monthly-mean (etc.) files:
variable_name:cell_methods = "time: mean" ;
EAMv3 does this for all time-mean files, so EAMxx would be at parity with EAM if it also adopted this CF convention.
That's all I got for now. Feedback welcome!