cf-xarray
cf-xarray copied to clipboard
Add support for Discrete Sampling Geometry datasets.
I ran into this while trying to set up a dataset with a collection of vertical profiles (i.e., a transect).
I think we should consider adding a new axis named "discrete": http://cfconventions.org/cf-conventions/cf-conventions.html#discrete-axis The dimensions of the discrete axis would then be defined by an attribute named "instance_dimension": http://cfconventions.org/cf-conventions/cf-conventions.html#collections-instances-elements The "instance_dimension" attribute is assigned to all "index_variables", which are 1D coordinates.
This is an example where I've extracted a transect from a C-grid model. After extraction I've removed the "axis" attributes from all X and Y variables, and I've added the attribute da.attrs["instance_dimension"] = "station"
to all 1D variables with dimension "station".
In this scenario, I think the axes should be "Z", "T", and "discrete", where ds.axes["discrete"] = ["station"]
. Then we should probably also add ds.cf.index_variables
, which returns all index variables (e.g., lon, lat, label, indexes on the original grid, ....).
There is a global attribute named "featureType": http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types
Not sure whether it would be preferable to add ds.cf["discrete"]
and ds.cf.index_variables
only if the attribute is present, and maybe axes ["X", "Y"] and "discrete" should be mutually exclusive?
Yes we should support this "discrete sampling geometry" stuff.
I am confused about how these are represented however. The CF CDL examples don't use the discrete
attribute; http://cfconventions.org/cf-conventions/cf-conventions.html#_indexed_ragged_array_representation_of_trajectories but instance_dimension
is used. cf_role
also seems important
@ocefpaf Can you point us to a "nice" dataset that uses these attributes?
@ocefpaf Can you point us to a "nice" dataset that uses these attributes?
I believe we have some "gold standards" somewhere. Let me check and get back to you.
@dcherian and @ocefpaf, we have USGS oceanographic data in CF-1.6 compliant format, both featureType: timeSeries
and featureType: timeSeriesProfile
data on our THREDDS server, where you can download the data at NetCDF or access via OPeNDAP.
For example, all of the data from this experiment in Grand Bay (thanks to @dnowacki-usgs):
Specific Examples:
Thanks @rsignell-usgs , those datasets are a lot more straightforward.
instance_dimension
is still confusing to me but it is used to represent ragged arrays. For xarray, we could decode this to either a MultiIndexed dataset or a sparse array dataset.
OTOH cf_role
tagged variables provide a unique identifier for a "trajectory", so if you were concatenating multiple trajectory files, you would create a new coordinate for this cf_role
variable and concatenate along that. I think we can support indexing by cf_role
. Only valid keys are trajectory_id
, timeseries_id
, profile_id
Also related: https://ncas-cms.github.io/cfdm/tutorial.html#discrete-sampling-geometries and https://github.com/pydata/xarray/issues/1077#issuecomment-645416425 where we are confused about which of these representations maps more cleanly to a sparse DataArray, and which to a MultiIndexed DataArray
The NCEI netCDF templates look useful (but I haven't looked closely): https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html