cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

A cf-xarray compliance checker?

Open kthyng opened this issue 2 years ago • 9 comments

Would something like this be in scope for cf-xarray? It would need to be fairly loosely defined, but maybe a minimum would be that a Dataset would have axes and coordinates all defined? Variables would need standard_names? Though some variables don't usually have standard names like maybe "angle" on a ROMS grid.

kthyng avatar Sep 30 '22 20:09 kthyng

A number of these exist:

  • https://github.com/ioos/compliance-checker
  • https://pumatest.nerc.ac.uk/cgi-bin/cf-checker.pl

so i don't think we should reinvent it. It would be nice if we could run the checker on a Dataset using ds.cf.check(checker="ioos") for example

cc @ocefpaf

dcherian avatar Oct 03 '22 16:10 dcherian

For another project I've been looking at CF checkers last week, and it looks like all options are mostly command-line tools meant to check NetCDF files.

It would be great if cf-xarray allows to check any format supported by xarray and datasets that have not been written on disk. I also think it would be great to use other checkers in the backend, but looks like before doing it changes are needed in compliance-checker and cf-checker (i.e., the checkers only accept paths right now, they would have to accept xarray datasets as well).

malmans2 avatar Oct 03 '22 17:10 malmans2

It'd be nice to build an API connection, but worst case we can write a tiny dataset with all attributes to /tmp/check.nc and run that, and print the output to screen.

dcherian avatar Oct 03 '22 17:10 dcherian

I have mixed feelings. While I don't want to overload cf-xarray with functionalities that exists elsewhere this could be a nice idea b/c:

  1. what @malmans2 said above
  2. compliance-checker is super verbose and sometimes you don't want a full CF check, just a bare bones "what is missing so I can plot this automatically, or load this data into analysis X." In a way, iris used to be like that but has become more and more restrictive with time.

I guess that, instead of becoming a compliance-checker cf-xarray could have a "verbose mode" where all the compliance issues would be printed when loading a dataset.

ocefpaf avatar Oct 03 '22 17:10 ocefpaf

"what is missing so I can plot this automatically, or load this data into analysis X."

This is hard to define!

dcherian avatar Oct 03 '22 18:10 dcherian

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names
  3. enough variables defined to compute say z for example

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

ocefpaf avatar Oct 03 '22 18:10 ocefpaf

I wrote some tests for a package: https://github.com/NOAA-ORR-ERD/model_catalogs/blob/main/model_catalogs/tests/test_catalogs.py#L326-L369

When the models are read in with the package, they should be able to be used by cf-xarray in a basic way. I am finding I need this functionality again so that is when I thought it could be useful in cf-xarray itself. It could warn a user if no axes or coordinates are known for a Dataset/Array, and which data_vars do not have standard_names. I also like the connection @ocefpaf said for being able to calculate z.

kthyng avatar Oct 07 '22 21:10 kthyng

NASA-specific compliance checker: https://github.com/eugenegesdisc/diwg-data-compliance-test

dcherian avatar Jun 05 '24 21:06 dcherian

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names

I'd suggest allowing long_names as an option, for those variables that aren't in the standard name table yet. You can add a warning pointing to the forum for adding standard names if you want to discourage long_name without standard_name.

  1. enough variables defined to compute say z for example

Everything mentioned in formula_terms or similar, at a guess? Or do you want enough information to convert from the model vertical coordinate to a geometric vertical coordinate?

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

I'd suggest a fourth check for units: it's possible to guess from values, but I like having that explicitly

DWesl avatar Jun 25 '24 19:06 DWesl