cmip6-cmor-tables
cmip6-cmor-tables copied to clipboard
Adding a license file to this repo for conda-forge releases?
@durack1 @mauzey1 Hey Paul and Chris,
The E3SM data publication team is working on a solution to package external data needed for cmorizing E3SM data and release them as an independent data package. This way can significantly streamline our data publication workflow and make it more portable. To do so, we will need a license file to be included in [cmip6-cmor-tables](https://github.com/PCMDI/cmip6-cmor-tables)
repo. A detailed discussion can be find here https://github.com/E3SM-Project/e3sm_to_cmip/issues/124. I'm wondering, would you please consider adding a license? Thanks a lot!
@chengzhuzhang @xylar interesting question. In principle I have no problem adding a license, one question, what license flavour would make it easiest?
One reluctance I have in proceeding, while wrapping this repo into a conda package and automating its download will aid automated use, it further abstracts the requirement to register/update model/institution info to a user - it's not quite a set and forget process
Hey Paul, Thanks for chiming in. Since both CMIP6_CVs
and cmip6-cmor-tables
repos are open source on GitHub, perhaps it might not be too concerned that more users would skip the WCPR registration requirement through a conda package? The conda package does add one more entry point to these tables though.
For a work-around, It seems like https://github.com/WCRP-CMIP/CMIP6_CVs and cmip6-cmor-tables
share the same set of json files? The former has a license already. Maybe we should use CMIP6_CVs instead? (and I'm curious what's the difference between both?) Thank you.
Hi Jill, the two repos are separate. WCRP-CMIP/CMIP6_CVs is the controlled vocabulary (and registration) repo for CMIP6, where institutions, models, experiments, MIPs, etc are defined. A subset of the registered information is pulled into the PCMDI/cmip6-cmor-tables/Tables/CMIP6_CV.json file, which allows a modeling group to use CMOR with minimum configuration, as the registered information is available for use by the software, packaged in the cmip6-cmor-tables.
The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute license
field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)
The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute license field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)
I think that's correct. We need a license file that describes limitations (if any) on packaging and redistributing the contents of the repository. The license linked to on https://github.com/WCRP-CMIP/CMIP6_CVs is not that kind of license. It is a license the software adds to contents it produces.
Hi Jill, the two repos are separate. WCRP-CMIP/CMIP6_CVs is the controlled vocabulary (and registration) repo for CMIP6, where institutions, models, experiments, MIPs, etc are defined. A subset of the registered information is pulled into the PCMDI/cmip6-cmor-tables/Tables/CMIP6_CV.json file, which allows a modeling group to use CMOR with minimum configuration, as the registered information is available for use by the software, packaged in the cmip6-cmor-tables.
The WCRP-CMIP/CMIP6_CVs license, is a default file license template for use in the netcdf global attribute
license
field, so isn't the kind of license that is required to create a conda-forge package (@xylar can correct me here if I'm wrong)
Hey Paul, thank you for the clarification. It seems one goal of this repo is to facilitate using it together with CMOR for modeling groups to create CMIP compliance files, then I think it makes sense to have conda release of this repository? The goal from E3SM side is to be able to port cmorization process easily to supported machines. And maintaining a conda package of this repo is a better approach than manually git clone and update this repo on different platform. @xylar and I will help maintaining the conda package if a license can be added. Thanks for your consideration!
@chengzhuzhang in principle I have no problem with this, however, we'd need to know what license makes using a conda-forge archive easiest?
@mauzey1 do you see any problems with this?
Thank you @durack1 ! According to general guidelines for opensource software I'm thinking either Apache 2.0, BSD or MIT license would be proper (I'm reading on this page https://software.llnl.gov/about/licenses/). @xylar would you agree that any of this license type should be sufficient?
@durack1 @chengzhuzhang You mean having the CMIP6 CMOR tables' JSON files stored in a conda-forge package? Like a package that would install the directory of tables in <anaconda_path>/envs/my_env/share/cmip6-cmor-tables?
@mauzey1, yes, exactly. I have a recipe for doing that already here: https://github.com/xylar/staged-recipes/tree/add_cmip-cmor-tables/recipes/cmip6-cmor-tables
I'm thinking either Apache 2.0, BSD or MIT license would be proper (I'm reading on this page https://software.llnl.gov/about/licenses/). @xylar would you agree that any of this license type should be sufficient?
@chengzhuzhang, those are definitely common licenses on conda-forge so the would work for me.
@xylar Okay, then I agree with creating a conda-forge package for cmip6-cmor-tables.
I can certainly see the utility of having a conda package for the tables, but when it comes to versioning these tables please have a good think about the labelling of the package and how that updates.
For Met Office work I have a set of versions that we use for CMIP6 production (01.00.29, 01.00.31, 01.00.32), but we update the CMIP6_CV.json
file within these directories with the latest one when something important changes ( I recall doing this for fixes to experiment parent details and the introduction of a new model). You could choose a composite version number to represent the inputs (data request version + CVs version (+ CMOR version?)). In some ways it would be good to separate the MIP tables from the CVs, but this would likely be a bit disruptive.
One other thought; how often would a new conda package be published? Updates to the CVs still occur reasonably regularly with new models being added, although a bit of automation should cover this fairly easily.
@matthew-mizielinski, conda-forge can easily accommodate versioning like you suggest.
A bot can automatically create a new package each time there is a new version update as long as the recipe doesn't need to change (other than the version and sha256 hash of the files, which the bot updates). Maintenance on that side should be a piece of cake.
@xylar @matthew-mizielinski thanks for getting into the weeds with this. We do capture the CV, DREQ and CMOR versions in the release comment (see here), but importantly, a new version is not released when the CVs change (which happens relatively frequently), so unless we changed that process, the conda package would never update with the latest CVs - requiring a manual step by a user, not ideal
How important would the CVs be for the tools that would typically use the proposed conda-forge package? It isn't necessarily a problem if that file gets updated infrequently, as long as that is clear to users of the conda-forge package.
The path that a new modeling group is meant to follow, to use CMOR
- register their institution and model in the CVs (institution_id and source_id)
- this information is then propagated across to the cmip6-cmor-tables/Tables/CMIP6_CV.json file, and then a user can just select their information from the preconfigured/registered info to write files (in addition to configuring input files for CMOR use)
- as the information is registered in the CMIP6_CVs this then triggers downstream support, ESGF publisher, citation, ES-DOCs etc in a consistent way
So to aid a user, having the most up-to-date CMIP6_CV.json file would certainly be a necessity, otherwise hand-spun edits will be required which may break consistency with the registered information that other software expects
Okay, that's good to know. Why not do a release every time the CMIP6_CV.json file gets updated?
There is no way to do a conda-forge package from anything other than a release and it isn't a good idea to overwrite or edit files in a conda environment, since they might unexpectedly get overwritten by a later update, etc.
@xylar there has been no need to (up until now), and our versioning tag doesn't account for the CV version, however, the tag/release comment does
Okay, I'll leave this to the rest of you to discuss. I know for e3sm_to_cmip
, the current situation of each user cloning this repo is not working well. I'm happy to help with conda-forge packaging if that ends up being practical. But I don't understand enough of the subtleties or who the end users might be to weigh in on those details.
@xylar a practical question from me, do you expect folks to run conda update
on their env every time they go to use it?
That would be a question for @chengzhuzhang. It sounds like it might be worth including a conda update
as part of the e3sm_to_cmip
workflow to make sure the latest version of cmip6-cmor-tables
is being used.
Hmmm.. conda update
would only help if the package is released frequent enough (i.e., each time user facing features are updated?). Now I learned more about the use of this repo. I understand for an existing registered modeling center, most changes of CMIP6_CV.json doesn't really impact, unless its own registered information is updated though.
@chengzhuzhang yeah exactly, so when we updated E3SM1-0
to include the UCI
institution_id, if you had cloned the repo after this was merged (and pulled across into the cmip6-cmor-tables
repo) then you'd have no further tweaks to apply, assuming that you're also using the latest ESGF publisher which may validate your entries
thanks for all the clarifications @durack1 , with this, I think without changing current release plan, a conda package may not be the best and most practical approach...
@xylar, I think there is another possibility: that is to manage data in this repo in the similar fashion as managing our analysis dataset. This is a less automatic approach, but we will perhaps have better control over schedules for updating these data file. We can talk more offline about this possibility.
@chengzhuzhang, let's pause for a second and let me chat with @taylor13, @matthew-mizielinski and @mauzey1 to figure out if a conda package makes sense. To be honest, I like the ease of use this would enable, however, there are questions remaining in how we'd make sure things are kept up to date without causing more problems (and work) than it solves
@chengzhuzhang, another feasible approach to have e3sm_to_cmip
clone this GitHub repo into some predefined scratch space (specific to a user, rather than shared) as the first step of running? That way, you would always have the latest version without the need for a release?
It sounds like my approach or yours are more feasible than a conda-force package.
Thank you everyone for the discussion.
@xylar either way, adding a license is a trivial step, would CC BY-SA 4.0 cause you any issues in conda-forge land?
@durack1, CC BY-SA 4.0 should work just fine. I was able to find several existing packages with that license.
@xylar, just a thought, but I use a python class for something similar; cartopy.io.Downloader
@durack1, it wouldn't be too hard to write an api to retrieve tables and CVs with something like this downloader class. That would be something that could go to conda, and a simple function call like cmor_tables_location(table_version='01.00.33', cv_version='latest', destination='<somewhere>')
would be able to provide access to any version of the tables.
@matthew-mizielinski, so I've given the downloader some thought. I think what would work well for our software (e3sm_to_cmip
) is to have the conda package for most files cmip6-cmor-tables
. This is preferable to a downloader in that all we have to do is make a specific version (or a constrained version) of the package a dependency of our software and the downloading happens automatically without any extra steps. But a simple download command for cmip6-cmor-tables/Tables/CMIP6_CV.json
at the beginning of our process is a really good idea, so we don't rely on the (potentially outdated) version from the packag. Since that's a single file, it wouldn't require anything fancy like its own downloader, we could just use requests
.