cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

Add support for Discrete Sampling Geometry datasets.

Open malmans2 opened this issue 3 years ago • 6 comments

I ran into this while trying to set up a dataset with a collection of vertical profiles (i.e., a transect).

I think we should consider adding a new axis named "discrete": http://cfconventions.org/cf-conventions/cf-conventions.html#discrete-axis The dimensions of the discrete axis would then be defined by an attribute named "instance_dimension": http://cfconventions.org/cf-conventions/cf-conventions.html#collections-instances-elements The "instance_dimension" attribute is assigned to all "index_variables", which are 1D coordinates.

This is an example where I've extracted a transect from a C-grid model. After extraction I've removed the "axis" attributes from all X and Y variables, and I've added the attribute da.attrs["instance_dimension"] = "station" to all 1D variables with dimension "station".

discrete

In this scenario, I think the axes should be "Z", "T", and "discrete", where ds.axes["discrete"] = ["station"]. Then we should probably also add ds.cf.index_variables, which returns all index variables (e.g., lon, lat, label, indexes on the original grid, ....).

There is a global attribute named "featureType": http://cfconventions.org/cf-conventions/cf-conventions.html#_features_and_feature_types Not sure whether it would be preferable to add ds.cf["discrete"] and ds.cf.index_variables only if the attribute is present, and maybe axes ["X", "Y"] and "discrete" should be mutually exclusive?

malmans2 avatar Jan 07 '21 14:01 malmans2

Yes we should support this "discrete sampling geometry" stuff.

I am confused about how these are represented however. The CF CDL examples don't use the discrete attribute; http://cfconventions.org/cf-conventions/cf-conventions.html#_indexed_ragged_array_representation_of_trajectories but instance_dimension is used. cf_role also seems important

@ocefpaf Can you point us to a "nice" dataset that uses these attributes?

dcherian avatar Jan 07 '21 17:01 dcherian

@ocefpaf Can you point us to a "nice" dataset that uses these attributes?

I believe we have some "gold standards" somewhere. Let me check and get back to you.

ocefpaf avatar Jan 08 '21 23:01 ocefpaf

@dcherian and @ocefpaf, we have USGS oceanographic data in CF-1.6 compliant format, both featureType: timeSeries and featureType: timeSeriesProfile data on our THREDDS server, where you can download the data at NetCDF or access via OPeNDAP.

For example, all of the data from this experiment in Grand Bay (thanks to @dnowacki-usgs):

Specific Examples:

rsignell-usgs avatar Jan 09 '21 12:01 rsignell-usgs

Thanks @rsignell-usgs , those datasets are a lot more straightforward.

instance_dimension is still confusing to me but it is used to represent ragged arrays. For xarray, we could decode this to either a MultiIndexed dataset or a sparse array dataset.

OTOH cf_role tagged variables provide a unique identifier for a "trajectory", so if you were concatenating multiple trajectory files, you would create a new coordinate for this cf_role variable and concatenate along that. I think we can support indexing by cf_role. Only valid keys are trajectory_id, timeseries_id , profile_id

dcherian avatar Jan 09 '21 16:01 dcherian

Also related: https://ncas-cms.github.io/cfdm/tutorial.html#discrete-sampling-geometries and https://github.com/pydata/xarray/issues/1077#issuecomment-645416425 where we are confused about which of these representations maps more cleanly to a sparse DataArray, and which to a MultiIndexed DataArray

dcherian avatar Jan 09 '21 16:01 dcherian

The NCEI netCDF templates look useful (but I haven't looked closely): https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html

dcherian avatar Jun 24 '21 17:06 dcherian