EMAworkbench
EMAworkbench copied to clipboard
Load model configuration from a YAML file
Load model configuration in a generic way from a YAML file.
do you have an example snippet of a yaml file you currently use?
I am in the process of cleaning up to publish in the next couple days a whole mess of examples and the rest of the meta-model codebase we've been working on. Will let you know when we put it up.
Quite likely to be not relevant anymore, but here is an option:
import yaml
with open("params.yaml", "r") as f:
config = yaml.safe_load(f)
my_model = Model(name="model", function=model)
constants = config["constants"]
uncertainties = config["uncertainties"]
levers = config["levers"]
my_model.constants = [Constant(key, values) for key, values in constants.items()]
my_model.uncertainties = [RealParameter(key, values[0], values[1]) for key, values in uncertainties.items()]
my_model.levers = [CategoricalParameter("my_policy", [value for _, value in levers.items()])]
And the structure of the params.yaml
is as follows:
constants:
x: 0.28
y: false
z: 0.0
levers:
0: "first"
1: "second"
2: "third"
uncertainties:
a:
- 1.0
- 1.5
b:
- 0.04
- 0.07
c:
- 0.01
- 0.03
I can sketch a loader if it is of interest.
I want to look at some loaders in general at some point so it is still of interest.
There is also a link between loaders and the persistence of experimental setup metadata. At the moment, I store the uncertainties, etc. in a limited form. having a nice loader and storage solution that is expressively richer than CSV files would be quite useful for reproducibility.
Perhaps, let's start with just yaml? How about:
class Loader:
@staticmethod
def load_yaml(file_path):
with open(file_path, 'r') as file:
data = yaml.safe_load(file)
uncertainties = []
constants = []
levers = []
for parameter, value in data.get('uncertainties', {}).items():
if 'type' in value:
if value['type'] == 'real':
uncertainties.append(RealParameter(parameter, value['min'], value['max']))
elif value['type'] == 'integer':
uncertainties.append(IntegerParameter(parameter, value['min'], value['max']))
elif value['type'] == 'categorical':
uncertainties.append(CategoricalParameter(parameter, value['categories']))
for parameter, value in data.get('constants', {}).items():
constants.append(Constant(parameter, value))
for parameter, value in data.get('levers', {}).items():
if 'type' in value:
if value['type'] == 'real':
levers.append(RealParameter(parameter, value['min'], value['max']))
elif value['type'] == 'integer':
levers.append(IntegerParameter(parameter, value['min'], value['max']))
elif value['type'] == 'categorical':
levers.append(CategoricalParameter(parameter, value['categories']))
return uncertainties, constants, levers
For a yaml of the following format:
uncertainties:
x1:
type: real
min: 0.1
max: 10
x2:
type: real
min: -0.01
max: 0.01
x3:
type: real
min: -0.01
max: 0.01
constants:
constant1: 10
constant2: 20
levers:
lever1:
type: real
min: 0.5
max: 1.5
lever2:
type: integer
min: 1
max: 5
lever3:
type: categorical
categories: [A, B, C]
This is an interesting discussion. It does provide more flexibility, but it does conflict a bit with PEP 20.
There should be one-- and preferably only one --obvious way to do it.
Currently that would be a (nested) dictionary, right? I think there is enough software to convert yaml to dictionaries. So the question is if we also want to do that internally. It would be something to document, test, provide examples for, etc.
Interesting point! So, what is this "one-- and preferably only one --obvious way to do it" now?
I thought about this a bit. Personally, I think building our own YAML reader and supporting and maintaining is quite far from the essence of this library. However, it would be useful to have a one-liner to assign a dictionary to a model as configuration.
The most elegant way to do it currently is probably like this:
# Combined dictionary for uncertainties, constants, and levers
model_elements = {
'uncertainties': {
'x1': RealParameter('x1', 0.1, 10),
'x2': RealParameter('x2', -0.01, 0.01),
'x3': RealParameter('x3', -0.01, 0.01)
},
'constants': {
'constant1': Constant('constant1', 10),
'constant2': Constant('constant2', 20)
},
'levers': {
'lever1': RealParameter('lever1', 0.5, 1.5),
'lever2': IntegerParameter('lever2', 1, 5),
'lever3': CategoricalParameter('lever3', ['A', 'B', 'C'])
}
}
from ema_workbench import Model
my_model = Model('my_model', function=model_function)
my_model.uncertainties, my_model.constants, my_model.levers = (
[element for elements in model_elements[key].values()] for key in ['uncertainties', 'constants', 'levers']
)
I agree that this is not ideal. I do think a dictionary is the obvious (in between) way to load model configurations. That leaves three problems:
- Parsing other file formats (yaml, toml, csv) to a dict. Personally, I would see that out of scope for the workbench, since there is many software that currently does that (PyYAML, tomllib).
- Standardizing an accepted dictionary format.
- Do we want to keep classes (like
RealParameter
) or use keywords (like"real"
) - The name is currently duplicate, can we simplify that?
- Do we want to keep classes (like
- Adding a method to the
Model
class that accepts a configuration dictionary. Could be something like (depending on the standardized dict format):
def assign_elements(self, elements):
"""
Assign uncertainties, constants, and levers from a dictionary to the model.
:param elements: A dictionary containing the uncertainties, constants, and levers.
"""
if 'uncertainties' in elements:
self.uncertainties = [element for element in elements['uncertainties'].values()]
if 'constants' in elements:
self.constants = [element for element in elements['constants'].values()]
if 'levers' in elements:
self.levers = [element for element in elements['levers'].values()]
There currently exists a parameters_to_csv
and parameters_from_csv
function. These functions are not really used to my knowledge, are not actively maintained, and are not covered by any unit tests. The main problem is that CVS is not expressive enough to capture the full richness that you have with parameters. For example, it is tricky to capture non-uniform distributions which are supported by the workbench.
Going to another format might solve this problem. For me, a key question is one of scope. Do we want to use this to expose the full functionality of the workbench for specifying parameters and outcomes, or do we want to cover only a subset?
I would like to see a more comprehensive way of storing data on the exact experimental setup in the results that are stored. Currently, a JSON file that contains a small subset of this information is created and stored as part of the tarball. However, this only contains the name and class of each parameter. So, no information on ranges/categories, etc., is stored. Likewise, no information is stored on the sampling scheme that was used (this is an issue separate from YAML files but highlights the importance of providing provenance). If we can agree on the scope of the YAML or other markup language, we might also use this as as part of the current storage approach.