CESM Add more metadata on compset-grid compatibility

It is currently hard for users to know what grids are supported for a given compset and which of those are the ones that are most recommended for scientific use. We should add better documentation on this, probably created from additional metadata in one or more xml or yaml files.

@alperaltuntas has started adding more metadata along these lines for the sake of the create_newcase GUI he is working on. He says:

The GUI takes into account the following relational metadata: (1) compset and not compset attributes in config_grids and (2) any rules defined in config_comply.yaml involving grids. See, for example: https://github.com/alperaltuntas/cime/blob/627995b8f582325e054374909663bc3635799b7b/config/cesm/config_comply.yml#L77

Using the above two metadata, the tool comes up with a full list of compatible grids. However, the tool initially shows only a subset, i.e., the suggested grids, which are all the grids that are compatible and have <desc> entries, as you said. Subsequently, the user can expand the list to show all compatible grids including the ones that don't have <desc>.

I can think of the following possible metadata for compset-grid compatibility:

(a) For a given grid, what compsets are / are not compatible: currently specified with compset / not_compset attributes in config_grids.xml; in addition, Alper has an example for T42 in config_comply.yml (T42 can only be used with SCAM)

(b) Similar to (a), but in the other direction: For a given component or specific compset, what grids are / are not compatible: not currently specified in general, but Alper has an example in config_comply.yml (any MOM compset is only compatible with 3 grids)

(c) For a given compset, at which resolutions (if any) is it scientifically-supported: currently defined in config_compsets.xml

(d) For a given compset, what are all of the standard / recommended resolutions at which it is typically run in scientific simulations: not currently specified, but I feel this should be specified to address Dan Marsh's request and similar needs

(e) For a given compset, what are all of the resolutions at which it is regularly tested: can be retrieved from the testlist files, but I'd argue that this is not particularly useful for a user, at least for the I compsets with which I am most familiar

Thinking about uses other than the GUI: I think that (c) and (d) are what we want to present in our online documentation (and also ideally accessible by CIME's command-line tools), whereas (a) and (b) should be used for checking compatibility at create_newcase time and possibly also queryable via some command-line tool – so should be accessible to CIME as well as the GUI.

I would further suggest that (c) and (d) would be easiest to maintain and keep correct if specified for each compset in config_compsets.xml, since this needs to be considered whenever you add a new compset. For (d), one thought is that each compset defined in config_compsets.xml could define an element like <standard_resolutions>. This would be somewhat subjective, but specify the answer to the question, "What resolutions do you typically run this compset at for scientific runs?" This would require some periodic maintenance, but it seems like the burden wouldn't be too great. (We could consider whether it makes sense to have a default value that applies to all compsets defined in a given file. This might make sense for I compsets, but maybe not F compsets or some others. The downside is that it could make it easy to forget to add a compset-specific list in cases where a given compset differs from the default: it might be safest just to require this to be specified explicitly for each compset.)

For (a) and (b) I'm coming to like the idea of using something like config_comply.yml (whether it's in yaml or xml), because that allows grid-centric and compset-centric rules to all be specified in the same place. However, I feel that it's problematic to have some rules specified in config_grids.xml and others specified in config_comply.yml: I think we want all of these rules in a single place that is accessible from both the GUI and CIME's command-line tools.

@alperaltuntas @mvertens @jedwards4b @briandobbins

Oct 28 '21 00:10 billsacks

From a discussion today with @alperaltuntas @mvertens @jedwards4b @briandobbins -

For this point:

(d) For a given compset, what are all of the standard / recommended resolutions at which it is typically run in scientific simulations: not currently specified, but I feel this should be specified to address Dan Marsh's request and similar needs

while we see value in this, we are also concerned that it would be difficult to maintain, and could likely get out of date. @mvertens proposed starting by creating a document that describes some general rules, like F compsets are typically run with the ocean grid being the same as the atmosphere grid. Then we can see if that is sufficient (and could be added to some static documentation somewhere) or if we want to do more than that.

For these points:

(a) For a given grid, what compsets are / are not compatible: currently specified with compset / not_compset attributes in config_grids.xml; in addition, Alper has an example for T42 in config_comply.yml (T42 can only be used with SCAM)

(b) Similar to (a), but in the other direction: For a given component or specific compset, what grids are / are not compatible: not currently specified in general, but Alper has an example in config_comply.yml (any MOM compset is only compatible with 3 grids)

I felt there was general agreement that we want to specify this sort of compatibility information in a single format, not have some of it in config_grids.xml and some in config_comply.yml. There seemed to be general support for something like config_comply.yml because it allows the same syntax to be used to specify grid-centric and compset-centric rules. Regarding the tension between having information like this in a single, central location vs. decentralized @alperaltuntas pointed out that we could split up config_comply.yml just like we do for some of the xml files.

We also discussed file format. This is important if we want this file to be used by the command-line interface (like create_newcase) in addition to the GUI – which I think we do want. In principle, it seems like config_comply.yml could be specified in xml, but it might be harder to work with and would require some up-front effort from @alperaltuntas that we're not sure is worthwhile. It seems like the general feeling was that we'll probably start supporting a limited set of 3rd party python dependencies at some point. @jedwards4b feels that this will be easier once we have the CIME7 reorganization (https://github.com/ESMCI/cime/issues/3886 ).

So the thinking is that, for now, let's stick with business as usual: not removing any of the compset-grid compatibility metadata from config_grids.xml, and having Alper add some new metadata in config_comply.yml, but only having the latter used by the GUI for now. Then, once the CIME7 reorganization is done, we can think about how we want to support 3rd party python dependencies (see also https://github.com/ESMCI/cime/issues/4059 ). Once we have a plan in place for that, we can return to the question of whether all of this compset-grid compatibility metadata should be moved into config_comply.yml.

Nov 03 '21 16:11 billsacks

(a) For a given grid, what compsets are / are not compatible: currently specified with compset / not_compset attributes in config_grids.xml; in addition, Alper has an example for T42 in config_comply.yml (T42 can only be used with SCAM)

Where did this restriction originate? The CESM Simpler Models page uses this grid for several examples and we regularly test these compsets plus some aquaplanet configurations using --res T42_T42_mg17.

Nov 03 '21 17:11 gold2718