ClimaParams.jl icon indicating copy to clipboard operation
ClimaParams.jl copied to clipboard

TOML schema

Open simonbyrne opened this issue 3 years ago • 15 comments

I think there has been a general consensus that we should move to a TOML format. The specifics of how it should be specified in this file need to be clarified.

@odunbar did you have some examples somewhere?

simonbyrne avatar Jan 10 '22 22:01 simonbyrne

What I had been thinking (and is roughly what is implemented in #57) is something like the following:

A parameter set file would have the following keys (all are optional unless marked as required)

  • name (required): the name of the parameter set: this would become the name of the struct in the code, so should be a valid Julia symbolic name
  • inherits_from: the name of the parameter set to inherit values from
  • parameters: a table of parameters (see below)
  • parameters_include: an array of relative paths (from the current file) of other .toml files containing parameter tables
  • override_values: a table of (parameter key, numeric value) pairs describing inherited parameters which should be modified in this parameter set.

A parameter is an entry in a table. Each key in the table should be a descriptive but valid Julia symbolic name, e.g. MolarMassConstant, and each entry would contain the following keys

  • description: a long form description of the constant, and necessary references (can be formatted using Julia Markdown).
  • symbol: the standard symbol name used to refer to the parameter
    • should this be required to be unique?
  • units: a string containing the units of the parameter; should be parsable by Unitful.jl.
  • value: the numeric value of the parameter in the current parameter set

Additional keys for UQ (e.g. prior distributions, etc) can also be added. We might also be able to add some mechanism for derived parameters, to support things like https://github.com/CliMA/CLIMAParameters.jl/blob/6f660320358bd2be611f382897aacf3084c66bf1/src/Planet/planet_parameters.jl#L5

Examples

# default.toml
name = "DefaultParameterSet"

parameters_include = [
 "parameters/universal.toml",
 "parameters/planet.toml",
]
# parameters/universal.toml
[MolarMassConstant]
description = "universal gas constant"
symbol = "R"
units = "J*K^−1*mol^−1"
value = 8.3144598
# custom.toml
name = "CustomParameterSet"
inherits_from = "CLIMAParameters.DefaultParameterSet"

# add new parameter
[parameters.Wobble]
description = "wobble rate"
units = "s^-1"
value = 10.2

[override_values]
MolarMassConstant = 8.5

simonbyrne avatar Jan 10 '22 22:01 simonbyrne

This goes in the right direction. @odunbar has more details of what we need. @glwagner and @ilopezgp also have relevant ideas and experiences on the calibration/UQ algorithm end (where we need the same files as input).

tapios avatar Jan 10 '22 22:01 tapios

clima_interface_v2_new This shows the parameter file content (As we know, whether we still do a "distribution of parameters" is WIP and depends on how the different model components will be written/communicate in the future)

The default file is in ClimaParameters. There will be an override file for every experiement we run.

The format of the parameter file contains those features relevant to Clima, (similar to Simon's construction above) plus some information for the calibration pipeline (e.g the priors). PS I had units as part of the description.

Some high level goals for the parameter file part:

  1. Software side, ideally ability to change just runtime parameter values (i.e we do not need to change the number of overrides etc. at runtime)
  2. User side. (a) Parameter names should be unique and verbose in this file as they may be used across the code. (b) they should be values, we will treat "derived parameters" as functions.

e.g.

[zeropointseven]
RunValue = 0.7
ValueType = "real"
Tags=""
Prior = "fixed"
Transformation = "none"
Description = "This parameter is constant 0.7 [nondimensional]"

odunbar avatar Jan 11 '22 17:01 odunbar

A couple of questions I would like to clarify:

  • In CLIMAParameters, will all the parameters be in a single file (if so, this could get quite large and be difficult to navigate), or split over multiple files?
  • What are the potential values of Type?
  • Can you expand on the role of Tags? I still don't quite understand their role.
  • What would the override file look like?

On this last point, depending on how we choose to represent the parameters (#59), it may be helpful to distinguish between information we need to know at compile time (e.g. which parameters are "overrideable"), and information that is only needed at runtime (the actual override values).

Also, at one point there was some discussion about having a mechanism to output all the parameter values (both overrides and not) that were used in a given experiment. I'm not sure this is feasible, because there isn't an obvious way to track whether a parameter is "used" in a given experiment, but if you store both the Manifest.toml (which will contain the exact version of CLIMAParameters used), and the override file, you should have all the information to reproduce the exact experiment.

simonbyrne avatar Jan 11 '22 19:01 simonbyrne

  1. Lets keep the defaults in CLIMAParameters as 1 file for now. In the end it may break up naturally, this is easy to implement later, without modifying any interfaces, (we can also make tools for searching such a file easily.)
  2. I was advised that Type category would be useful. I will allow people to decide what the point is here, i was basically just thinking of "string" or "number" here.
  3. The Tags are part of a more general design question. They should list of all repositories that use this parameter, we can also have a catch-all category (e.g. "Planet") for basic parameters used everywhere. I believe tags will have many uses down the line with a multi-repo codebase, For example ways of breaking up the parameter sets, or even ways of easily searching the parameter file.
  4. The override file will look like a list of the parameters (as seen with [zeropointseven]), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value.
  5. Parameter log is straightforward. For a run you can just merge default + override toml files (If a param is in the override file you use its values instead). I would advise we do this for every single run at run-time.I take your point RE the CLIMAParameters changing, but in theory if CLIMAParameters will only change when a relevant repository is also changed, so this is a more general Q about how you want to store the exact information of a run with a distributed Repo. I think a more frequent use case is people running simulations and certain ones crash/ exhibit interesting behvaiour due to parameter values etc. and would like to inspect the recent logs.

odunbar avatar Jan 11 '22 20:01 odunbar

To add to what @odunbar says:

  1. We will need multiple files in the end because we want to, for example, run the land model in isolation, and modify its parameters, without having to deal with atmospheric parameters. The multiple (component-model) files can be input into one master file though.
  2. For types, we need float, integer, logical, string. What type precisely the parameters will have (e.g., float32 or float64) should be determined by the type chosen for the model run (e.g., double or single precision).

Logging parameters and other model configuration for each run will be extremely important for being able to use the model effectively. This can just consist of a merge of all the parameter files. The way we have done this in the past is to write a log file with configuration information, into which parameter information is written as the parameter files are parsed.

tapios avatar Jan 11 '22 20:01 tapios

Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.

odunbar avatar Jan 11 '22 20:01 odunbar

Are there any examples of non-float parameters?

simonbyrne avatar Jan 11 '22 21:01 simonbyrne

We have used integers in the past, e.g., for different structural choices in parameterization schemes, or to specify the number of updrafts in the EDMF scheme.

tapios avatar Jan 11 '22 21:01 tapios

Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.

We should have, at minimum, files for

  • Planet (shared by model components, including thermodynamic constants)
  • Atmosphere
  • Ocean
  • Land

If we make it too monolithic, we create barriers to use for people who do not want to deal with a plethora of, for them, irrelevant parameters.

tapios avatar Jan 11 '22 21:01 tapios

Okay, that's helpful to know.

For a point of reference, here is what @haakon-e, @costachris and @charleskawczynski had been using in TurbulenceConvection.jl: https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl This also includes some non-parameter options, like file names and timestepping options, but a few things to note about what had evolved:

  • it ends up being rather hierarchical, for grouping similar parameters, e.g. https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl#L127-L134
  • the somewhat cryptically-named general_ent_params is actually array-valued: https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl#L101-L105
  • parameter names (at least as they appear in the configuration file) should be descriptive, rather than symbolic
    • this is especially true if we aren't going to use any sort of hierarchical grouping

simonbyrne avatar Jan 11 '22 22:01 simonbyrne

It would be useful to see more examples of how parameters have been used: if you have any example, please post them here

simonbyrne avatar Jan 11 '22 22:01 simonbyrne

Flatter hierarchies may be better. And we do need vector-valued and array-valued parameters (e.g., to specify neural network weights).

tapios avatar Jan 11 '22 22:01 tapios

For NN weights, would it be easier to save these to file and just provide the file name to clima as the parameter?

odunbar avatar Jan 11 '22 22:01 odunbar

  • The override file will look like a list of the parameters (as seen with [zeropointseven]), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value.

If the only information it needs to add is the value for each parameter, why would we need to output all the other information (since that is already available in CLIMAParameters)?

simonbyrne avatar Jan 11 '22 22:01 simonbyrne