cmor icon indicating copy to clipboard operation
cmor copied to clipboard

Checking out table versions (e.g. cmip6-cmor-tables) alongside CMOR install

Open durack1 opened this issue 1 year ago • 11 comments

durack1 avatar Oct 06 '22 21:10 durack1

Conversation with @sashakames noted that in order for PrePARE to currently validate data, we need to match the data_specs_version attribute to the cmip6-cmor-tables version that CMOR (or other software) used in to generate the file(s). We hit an issue with INM data where version 01.00.29 was used to create, but PrePARE 01.00.33 (latest) was used to validate and a single issue appeared.

It would be great if CMOR checked out the appropriate version when it first encounters this data_specs_version identifier, then keeps this available to the software if there is a next time that this version is required during publication/PrePARE checking.

We can obviously roll this forward in https://github.com/PCMDI/mip-cmor-tables/issues/3

durack1 avatar Oct 06 '22 21:10 durack1

We might assume that the tables are backward compatible (with only new variables added, but not changes to old variables). Any info. on the single issue raised by PrePARE?

taylor13 avatar Oct 07 '22 12:10 taylor13

@sashakames it seems we've solved the seats issue, so tagging you here - which will likely require @mauzey1 at some stage

durack1 avatar Oct 25 '22 23:10 durack1

Here's the issue raised if that's what you are after:

Your file contains "standard_name":"effective_radius_of_cloud_liquid_water_parti
cle_at_liquid_water_cloud_top" and
CMIP6 tables requires "standard_name":"effective_radius_of_cloud_liquid_water_pa
rticles_at_liquid_water_cloud_top".

sashakames avatar Oct 26 '22 15:10 sashakames

@durack1 So we want PrePARE to warn users that the data_specs_version of a file doesn't match the one in the tables being used by PrePARE? Should that just be a warning, or an error?

mauzey1 avatar Oct 26 '22 16:10 mauzey1

@mauzey1 that would be a nice addition, as a warning.

Thinking about this, ideally, it would be great to always check against the latest version of the tables, so that at no point in time do we allow issues that are known (and fixed in the latest tables) to be published.

The checkout of the latest table versions by PrePARE/CMOR install/runtime was the original focus of this issue.

durack1 avatar Oct 26 '22 18:10 durack1

thanks, @sashakames for the error message. "Your table" has a typo ("particle" instead of "particles"). Since this involves a standard_name (which must be found at http://cfconventions.org/Data/cf-standard-names/79/build/cf-standard-name-table.html , I think an "error" should be raised. The table with the error should be corrected before proceeding (even if it is the "master" table that's wrong).

taylor13 avatar Oct 26 '22 20:10 taylor13

@taylor13 just circling around on this. The issue was that the tables that the original data was written with included the error, and so they had created CMOR and cmip6-cmor-table validated data. The issue was when PrePARE was used to validate the data during publishing, and an updated and corrected version of the tables was used - so that was why I was suggesting a warning rather than an error and exit

durack1 avatar Oct 27 '22 04:10 durack1

Got it. You suggest we not burden users with fixing a problem that was only recently discovered (between the time they wrote the data with CMOR and the time it was checked by PrePARE. There might be at least 2 cases when we might not want to be that lenient:

  1. When a user has prepared output without using the CMOR tables and without CMOR, and has made an error in defining the standard_name.
  2. When a user has been lazy and not obtained the latest CMOR table before writing output. (The table used might be one made available when the data request was still evolving, long before the time that the user is writing data.) When publishing that data, the updated tables are applied and the error is discovered. Shouldn't we object for such sloppy application of CMOR and require the data be rewritten?

All that being said, I wouldn't strongly object to making this a warning (rather than an error) since it doesn't involve the DRS (i.e., the attributes that determine unique file names and directory structures).

taylor13 avatar Oct 27 '22 14:10 taylor13

My thought on the warning message would be that PrePARE would include a check of the table version against the global attribute found in the file. If there is a mismatch then warn the user in addition to any errors encountered as a result of the mismatch. I agree that PrePARE shouldn't pass data where there are errors as there is no distinction between a minor error like a typo in the above example versus something where the cf name was completely garbled.

Additionally, we had the notion of a minimum data_spec_version. While this is something that would be helpful to have in the publisher (on the client side) ultimately it is better to enforce server side. I'll raise this when requirements for publication services are discussed.

Following the warning the users may have the opportunity to downgrade their table version (provided it is still valid, meets the minimum) and publish.

sashakames avatar Oct 27 '22 14:10 sashakames

It would be useful to consider this alongside the future CMOR4 release - as we are standardizing on the mip-cmor-tables across projects, certainly makes sense to "ship" CMOR with the inputs it requires to run

durack1 avatar Apr 07 '24 16:04 durack1