MDTF-diagnostics icon indicating copy to clipboard operation
MDTF-diagnostics copied to clipboard

Support for yaml settings files

Open jkrasting opened this issue 4 years ago • 4 comments

We've run into a couple of issues with JSON's clunky syntax. A few drawbacks thus far:

  • JSON is rather unforgiving - a missed comma can break things
  • JSON does not natively support comments
  • JSON is not always readable with the "{" and "[" characters.

Here is a draft of the src/default_tests.jsonc converted over to src/default_tests.yaml format.

We could use PyYAML to parse the yaml format.

jkrasting avatar Nov 30 '20 16:11 jkrasting

Documenting the context for this issue:

  1. We originally used YAML for configuration, for the reasons @jkrasting suggests. However, in an email to him on 5 Nov 2019 in response to an installation error, I argued for moving to the current JSON-with-comments format:

"The error on analysis is because PyYAML is a dependency: the .csh wrapper creates a virtualenv and installs it there before calling the framework. For me, this tips the scales in favor of JSON over YAML for configuration files: it means the framework wouldn't depend on anything not in the python standard library. If dependencies are a snag for us, they'll probably be so for other users as well. Other than simplicity, the argument in favor of YAML was that it permitted comments, but I can wrap python's JSON parser with a regex to strip out comments."

The grounds for this objection no longer apply: we create a base conda environment that the framework itself runs in, rather than using whatever the system Python might be. Trying to write the entire framework using only the standard library was far too limiting.

Furthermore, the PR for adding level extraction/other preprocessing (NOAA-GFDL/MDTF-diagnostics#40) adds xarray to this environment, and that brings in pyyaml as a dependency, so we (will) already have pyyaml in the framework's environment.

  1. YAML is more "implicit" than JSON, but that can lead to issues when the parser makes incorrect assumptions for you. This is a pretty brief summary of the potential issues: eg 2013_05_10 is parsed as an integer 20130510 rather than a string. I'm only mentioning this for documentation purposes; I agree with @jkrasting that YAML will be far more convenient for users (eg, on the basis of native support for comments alone) and we're unlikely to hit these "gotchas" in practice.

  2. The actions agreed upon at this morning's meeting are:

  • Implement parsing logic for YAML config files, alongside the current JSON one. Determine which branch to use based on filename extension.
  • Migrate existing config files for the framework and PODs to YAML.
  • Update user docs to reflect this change.
  • Announce the change to users, and set a future deadline for deprecation of the JSON format.

This shouldn't be too much of an imposition, because no additions or extra unit tests will be needed for pyyaml's parsing function (unlike how we need to manually strip comments from JSON files), and translation between the two formats is feasible to do manually for the relatively short files we deal with.

tsjackson-noaa avatar Dec 01 '20 21:12 tsjackson-noaa

The switch to yaml seems reasonable. But how many users are going to be tripped up? Did anything get released with JSON files?

andrewgettelman avatar Dec 01 '20 21:12 andrewgettelman

The plan would be to support both formats for some period of time and issue a depreciation warning when using json files in order to encourage the migration to YAML.

jkrasting avatar Dec 02 '20 00:12 jkrasting

Documenting and referencing a recent issue #309 that lays out the plan to eventually go the yaml route. Stay tuned. The yaml example from this issue is useful.

aradhakrishnanGFDL avatar Jan 24 '22 16:01 aradhakrishnanGFDL