EMAworkbench icon indicating copy to clipboard operation
EMAworkbench copied to clipboard

Load model configuration from a YAML file

Open jpn-- opened this issue 6 years ago • 9 comments

Load model configuration in a generic way from a YAML file.

jpn-- avatar Feb 14 '19 13:02 jpn--

do you have an example snippet of a yaml file you currently use?

quaquel avatar Feb 14 '19 13:02 quaquel

I am in the process of cleaning up to publish in the next couple days a whole mess of examples and the rest of the meta-model codebase we've been working on. Will let you know when we put it up.

jpn-- avatar Feb 14 '19 14:02 jpn--

Quite likely to be not relevant anymore, but here is an option:

import yaml
with open("params.yaml", "r") as f:
    config = yaml.safe_load(f)

my_model = Model(name="model", function=model)

constants = config["constants"]
uncertainties = config["uncertainties"]
levers = config["levers"]

my_model.constants = [Constant(key, values) for key, values in constants.items()]
my_model.uncertainties = [RealParameter(key, values[0], values[1]) for key, values in uncertainties.items()]
my_model.levers = [CategoricalParameter("my_policy", [value for _, value in levers.items()])]

And the structure of the params.yaml is as follows:

constants:
  x: 0.28
  y: false
  z: 0.0

levers:
  0: "first"
  1: "second"
  2: "third"

uncertainties:
  a:
  - 1.0
  - 1.5
  b:
  - 0.04
  - 0.07
  c:
  - 0.01
  - 0.03

I can sketch a loader if it is of interest.

mikhailsirenko avatar Nov 23 '23 15:11 mikhailsirenko

I want to look at some loaders in general at some point so it is still of interest.

There is also a link between loaders and the persistence of experimental setup metadata. At the moment, I store the uncertainties, etc. in a limited form. having a nice loader and storage solution that is expressively richer than CSV files would be quite useful for reproducibility.

quaquel avatar Nov 23 '23 15:11 quaquel

Perhaps, let's start with just yaml? How about:

class Loader:
    @staticmethod
    def load_yaml(file_path):
        with open(file_path, 'r') as file:
            data = yaml.safe_load(file)
        
        uncertainties = []
        constants = []
        levers = []
        
        for parameter, value in data.get('uncertainties', {}).items():
            if 'type' in value:
                if value['type'] == 'real':
                    uncertainties.append(RealParameter(parameter, value['min'], value['max']))
                elif value['type'] == 'integer':
                    uncertainties.append(IntegerParameter(parameter, value['min'], value['max']))
                elif value['type'] == 'categorical':
                    uncertainties.append(CategoricalParameter(parameter, value['categories']))

        for parameter, value in data.get('constants', {}).items():
            constants.append(Constant(parameter, value))

        for parameter, value in data.get('levers', {}).items():
            if 'type' in value:
                if value['type'] == 'real':
                    levers.append(RealParameter(parameter, value['min'], value['max']))
                elif value['type'] == 'integer':
                    levers.append(IntegerParameter(parameter, value['min'], value['max']))
                elif value['type'] == 'categorical':
                    levers.append(CategoricalParameter(parameter, value['categories']))

        return uncertainties, constants, levers

For a yaml of the following format:

uncertainties:
    x1:
        type: real
        min: 0.1
        max: 10
    x2:
        type: real
        min: -0.01
        max: 0.01
    x3:
        type: real
        min: -0.01
        max: 0.01

constants:
    constant1: 10
    constant2: 20

levers:
    lever1:
        type: real
        min: 0.5
        max: 1.5
    lever2:
        type: integer
        min: 1
        max: 5
    lever3:
        type: categorical
        categories: [A, B, C]

mikhailsirenko avatar Nov 23 '23 17:11 mikhailsirenko

This is an interesting discussion. It does provide more flexibility, but it does conflict a bit with PEP 20.

There should be one-- and preferably only one --obvious way to do it.

Currently that would be a (nested) dictionary, right? I think there is enough software to convert yaml to dictionaries. So the question is if we also want to do that internally. It would be something to document, test, provide examples for, etc.

EwoutH avatar Nov 23 '23 19:11 EwoutH

Interesting point! So, what is this "one-- and preferably only one --obvious way to do it" now?

mikhailsirenko avatar Nov 24 '23 08:11 mikhailsirenko

I thought about this a bit. Personally, I think building our own YAML reader and supporting and maintaining is quite far from the essence of this library. However, it would be useful to have a one-liner to assign a dictionary to a model as configuration.

The most elegant way to do it currently is probably like this:

# Combined dictionary for uncertainties, constants, and levers
model_elements = {
    'uncertainties': {
        'x1': RealParameter('x1', 0.1, 10),
        'x2': RealParameter('x2', -0.01, 0.01),
        'x3': RealParameter('x3', -0.01, 0.01)
    },
    'constants': {
        'constant1': Constant('constant1', 10),
        'constant2': Constant('constant2', 20)
    },
    'levers': {
        'lever1': RealParameter('lever1', 0.5, 1.5),
        'lever2': IntegerParameter('lever2', 1, 5),
        'lever3': CategoricalParameter('lever3', ['A', 'B', 'C'])
    }
}
from ema_workbench import Model

my_model = Model('my_model', function=model_function)

my_model.uncertainties, my_model.constants, my_model.levers = (
    [element for elements in model_elements[key].values()] for key in ['uncertainties', 'constants', 'levers']
)

I agree that this is not ideal. I do think a dictionary is the obvious (in between) way to load model configurations. That leaves three problems:

  1. Parsing other file formats (yaml, toml, csv) to a dict. Personally, I would see that out of scope for the workbench, since there is many software that currently does that (PyYAML, tomllib).
  2. Standardizing an accepted dictionary format.
    • Do we want to keep classes (like RealParameter) or use keywords (like "real")
    • The name is currently duplicate, can we simplify that?
  3. Adding a method to the Model class that accepts a configuration dictionary. Could be something like (depending on the standardized dict format):
    def assign_elements(self, elements):
        """
        Assign uncertainties, constants, and levers from a dictionary to the model.

        :param elements: A dictionary containing the uncertainties, constants, and levers.
        """
        if 'uncertainties' in elements:
            self.uncertainties = [element for element in elements['uncertainties'].values()]

        if 'constants' in elements:
            self.constants = [element for element in elements['constants'].values()]

        if 'levers' in elements:
            self.levers = [element for element in elements['levers'].values()]

EwoutH avatar Nov 24 '23 13:11 EwoutH

There currently exists a parameters_to_csv and parameters_from_csv function. These functions are not really used to my knowledge, are not actively maintained, and are not covered by any unit tests. The main problem is that CVS is not expressive enough to capture the full richness that you have with parameters. For example, it is tricky to capture non-uniform distributions which are supported by the workbench.

Going to another format might solve this problem. For me, a key question is one of scope. Do we want to use this to expose the full functionality of the workbench for specifying parameters and outcomes, or do we want to cover only a subset?

I would like to see a more comprehensive way of storing data on the exact experimental setup in the results that are stored. Currently, a JSON file that contains a small subset of this information is created and stored as part of the tarball. However, this only contains the name and class of each parameter. So, no information on ranges/categories, etc., is stored. Likewise, no information is stored on the sampling scheme that was used (this is an issue separate from YAML files but highlights the importance of providing provenance). If we can agree on the scope of the YAML or other markup language, we might also use this as as part of the current storage approach.

quaquel avatar Nov 24 '23 16:11 quaquel