ClimaParams.jl
ClimaParams.jl copied to clipboard
TOML schema
I think there has been a general consensus that we should move to a TOML format. The specifics of how it should be specified in this file need to be clarified.
@odunbar did you have some examples somewhere?
What I had been thinking (and is roughly what is implemented in #57) is something like the following:
A parameter set file would have the following keys (all are optional unless marked as required)
-
name
(required): the name of the parameter set: this would become the name of the struct in the code, so should be a valid Julia symbolic name -
inherits_from
: the name of the parameter set to inherit values from -
parameters
: a table of parameters (see below) -
parameters_include
: an array of relative paths (from the current file) of other .toml files containing parameter tables -
override_values
: a table of (parameter key, numeric value) pairs describing inherited parameters which should be modified in this parameter set.
A parameter is an entry in a table. Each key in the table should be a descriptive but valid Julia symbolic name, e.g. MolarMassConstant
, and each entry would contain the following keys
-
description
: a long form description of the constant, and necessary references (can be formatted using Julia Markdown). -
symbol
: the standard symbol name used to refer to the parameter- should this be required to be unique?
-
units
: a string containing the units of the parameter; should be parsable by Unitful.jl. -
value
: the numeric value of the parameter in the current parameter set
Additional keys for UQ (e.g. prior distributions, etc) can also be added. We might also be able to add some mechanism for derived parameters, to support things like https://github.com/CliMA/CLIMAParameters.jl/blob/6f660320358bd2be611f382897aacf3084c66bf1/src/Planet/planet_parameters.jl#L5
Examples
# default.toml
name = "DefaultParameterSet"
parameters_include = [
"parameters/universal.toml",
"parameters/planet.toml",
]
# parameters/universal.toml
[MolarMassConstant]
description = "universal gas constant"
symbol = "R"
units = "J*K^−1*mol^−1"
value = 8.3144598
# custom.toml
name = "CustomParameterSet"
inherits_from = "CLIMAParameters.DefaultParameterSet"
# add new parameter
[parameters.Wobble]
description = "wobble rate"
units = "s^-1"
value = 10.2
[override_values]
MolarMassConstant = 8.5
This goes in the right direction. @odunbar has more details of what we need. @glwagner and @ilopezgp also have relevant ideas and experiences on the calibration/UQ algorithm end (where we need the same files as input).
This shows the parameter file content (As we know, whether we still do a "distribution of parameters" is WIP and depends on how the different model components will be written/communicate in the future)
The default file is in ClimaParameters. There will be an override file for every experiement we run.
The format of the parameter file contains those features relevant to Clima, (similar to Simon's construction above) plus some information for the calibration pipeline (e.g the priors). PS I had units as part of the description.
Some high level goals for the parameter file part:
- Software side, ideally ability to change just runtime parameter values (i.e we do not need to change the number of overrides etc. at runtime)
- User side. (a) Parameter names should be unique and verbose in this file as they may be used across the code. (b) they should be values, we will treat "derived parameters" as functions.
e.g.
[zeropointseven]
RunValue = 0.7
ValueType = "real"
Tags=""
Prior = "fixed"
Transformation = "none"
Description = "This parameter is constant 0.7 [nondimensional]"
A couple of questions I would like to clarify:
- In CLIMAParameters, will all the parameters be in a single file (if so, this could get quite large and be difficult to navigate), or split over multiple files?
- What are the potential values of
Type
? - Can you expand on the role of
Tags
? I still don't quite understand their role. - What would the override file look like?
On this last point, depending on how we choose to represent the parameters (#59), it may be helpful to distinguish between information we need to know at compile time (e.g. which parameters are "overrideable"), and information that is only needed at runtime (the actual override values).
Also, at one point there was some discussion about having a mechanism to output all the parameter values (both overrides and not) that were used in a given experiment. I'm not sure this is feasible, because there isn't an obvious way to track whether a parameter is "used" in a given experiment, but if you store both the Manifest.toml (which will contain the exact version of CLIMAParameters used), and the override file, you should have all the information to reproduce the exact experiment.
- Lets keep the defaults in CLIMAParameters as 1 file for now. In the end it may break up naturally, this is easy to implement later, without modifying any interfaces, (we can also make tools for searching such a file easily.)
- I was advised that
Type
category would be useful. I will allow people to decide what the point is here, i was basically just thinking of "string" or "number" here. - The
Tags
are part of a more general design question. They should list of all repositories that use this parameter, we can also have a catch-all category (e.g. "Planet") for basic parameters used everywhere. I believe tags will have many uses down the line with a multi-repo codebase, For example ways of breaking up the parameter sets, or even ways of easily searching the parameter file. - The override file will look like a list of the parameters (as seen with
[zeropointseven]
), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value. - Parameter log is straightforward. For a run you can just merge default + override toml files (If a param is in the override file you use its values instead). I would advise we do this for every single run at run-time.I take your point RE the CLIMAParameters changing, but in theory if CLIMAParameters will only change when a relevant repository is also changed, so this is a more general Q about how you want to store the exact information of a run with a distributed Repo. I think a more frequent use case is people running simulations and certain ones crash/ exhibit interesting behvaiour due to parameter values etc. and would like to inspect the recent logs.
To add to what @odunbar says:
- We will need multiple files in the end because we want to, for example, run the land model in isolation, and modify its parameters, without having to deal with atmospheric parameters. The multiple (component-model) files can be input into one master file though.
- For types, we need
float
,integer
,logical
,string
. What type precisely the parameters will have (e.g.,float32
orfloat64
) should be determined by the type chosen for the model run (e.g., double or single precision).
Logging parameters and other model configuration for each run will be extremely important for being able to use the model effectively. This can just consist of a merge of all the parameter files. The way we have done this in the past is to write a log file with configuration information, into which parameter information is written as the parameter files are parsed.
Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.
Are there any examples of non-float parameters?
We have used integers in the past, e.g., for different structural choices in parameterization schemes, or to specify the number of updrafts in the EDMF scheme.
Still, regarding point 1. I would still argue for just 1 file for now - as this partitioning is maybe not so clear yet until we have everything together.
We should have, at minimum, files for
- Planet (shared by model components, including thermodynamic constants)
- Atmosphere
- Ocean
- Land
If we make it too monolithic, we create barriers to use for people who do not want to deal with a plethora of, for them, irrelevant parameters.
Okay, that's helpful to know.
For a point of reference, here is what @haakon-e, @costachris and @charleskawczynski had been using in TurbulenceConvection.jl: https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl This also includes some non-parameter options, like file names and timestepping options, but a few things to note about what had evolved:
- it ends up being rather hierarchical, for grouping similar parameters, e.g. https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl#L127-L134
- the somewhat cryptically-named
general_ent_params
is actually array-valued: https://github.com/CliMA/TurbulenceConvection.jl/blob/7d58863d5de188b28f92792063169577f745d047/driver/generate_namelist.jl#L101-L105 - parameter names (at least as they appear in the configuration file) should be descriptive, rather than symbolic
- this is especially true if we aren't going to use any sort of hierarchical grouping
It would be useful to see more examples of how parameters have been used: if you have any example, please post them here
Flatter hierarchies may be better. And we do need vector-valued and array-valued parameters (e.g., to specify neural network weights).
For NN weights, would it be easier to save these to file and just provide the file name to clima as the parameter?
- The override file will look like a list of the parameters (as seen with
[zeropointseven]
), it will replace the default with the same name. For different runs in a calibration the EKP writes 100 new files, each look identical to the original override file, but with the "Value = ..." line replaced with a new value.
If the only information it needs to add is the value for each parameter, why would we need to output all the other information (since that is already available in CLIMAParameters)?