ert icon indicating copy to clipboard operation
ert copied to clipboard

Replace c config reading with python

Open eivindjahren opened this issue 2 years ago • 0 comments

The current entry point for getting EnkfMain is:

from res.enkf import EnkfMain, ResConfig

enkf_main = EnkfMain(ResConfig("my_config.ert"))

However, there is an alternative path already in place:

from res.enkf import EnkfMain, ResConfig
config = dict(...)

enkf_main = EnkfMain(ResConfig(config_dict=config))

which allocates ResConfig from a dictionary of the values in the config. The file reading in c is for various reasons both complex, not isolated, and verbose. Also the existing parser will call util_abort on syntax errors. Therefore we could change ResConfig so that the config file is parsed on the python side and then continues along the config_dict code-path later.

Instead of https://github.com/equinor/ert/blob/be5cb726ff67e73a579a5c45806d6b3330e517c5/res/enkf/res_config.py#L105-L124

we could have

    def __init__(
        self, user_config_file=None, config=None, throw_on_error=True, config_dict=None
    ):

        self._assert_input(user_config_file, config, throw_on_error, config_dict)

        self._errors, self._failed_keys = None, None
        
        if user_config_file is not None and config is not None:
           config_dict = parse_config_file(user_config_file, config, throw_on_error)
       
        configs, config_dir = self._alloc_from_dict(
            config_dict=config_dict, throw_on_error=throw_on_error
        )

where parse_config_file is implemented in python.

This could be done in such a way that the parsing is isolated from the rest of the code. The parser would be easily inspectable as it results in a dictionary. Also, if one wanted to, one could confidently add any other config format as an option:

with open("my_config.yaml", "r") as fh:
   config_dict = yaml.safe_load(fh)
enkf_main = EnkfMain(ResConfig(config_dict=config_dict))

as the object is eventually instantiated from that dictionary anyways.

Instantiating from config_dict is known to be buggy, so that will have to be fixed first.

Tasks

The following tasks were proposed with strangler pattern and test driven development in mind, as with most things in life, discretion is advised.

  • [ ] #3847
  • [ ] Test and debug the construction via config dict
  • [ ] Make functionality that resolves defines in config dict
  • [ ] Implement parsing (one keyword at the time) with fallback on the existing parser.
  • [ ] Remove legacy parser and disable c tests which rely on reading configs from c.
  • [ ] Move (one-by-one) previously disabled c tests either to python or rewrite to avoid file reading.

Blocking bugs

  • [x] #2553
  • [x] #2554
  • [x] #2556
  • [x] #2560
  • [x] #2561
  • [ ] #2567
  • [x] #2571
  • [x] #3801
  • [x] #3802

Default values

The design of the config_dict initialization was such that all values should be explicit. Therefore

  • ErtWorkflowList will not add default/system workflows when initializing from dict
  • SiteConfig will not add default/system jobs when initializing from dict
  • QueueConfig will not point to ert/bin/job_dispatch.py if USER_MODE=False and initializing from dict

Therefore it is necessary to add a step to handle default values. The resulting process then goes parse->add system values->initialize config.

Some design concerns with config_dict

  • EclConfig config dict takes date string of the form "1960/12/31" in the END_DATE key and parses it to a CTime. That complicates error-handling, hampers type checking and makes it difficult to give a good error message on malformed dates. It should be a datetime.date.
  • LogConfig config dict takes a res.util.enums.MessageLevelEnum in the LOG_LEVEL key. The names of these values are unfortunately e.g. LOG_CRITICAL etc. so to convert the string in the file (name="CRITICAL") to an enum is done by MessageLevelEnum["LOG_" + name]. Also there is overlap with the levels of the library logging (e.g. logging.CRITICAL). It might simplify things if the config dict contains enum from the library logging and MessageLevelEnum is only used internally in LoggingConfig. A similar concern might be raised with QueueConfig which takes QueueDriverEnum in ConfigKeys.QUEUE_SYSTEM.

Testing the existing construction via config dict (_alloc_from_dict)

In order for the approach to work the config dict needs to produce an equivalent ResConfig object as reading from a corresponding config file, and so this is the most interesting property to test.

The first part would be to produce, given a config_dict, the corresponding config_file_name = config_dict_to_file(config_dict) for which parsing should be inverse: config_dict == parse_config_file(confict_dict_to_file(config_dict)). Then its simply a manner of testing that ResConfig(config_dict_to_file(config_dict)) == ResConfig(config_dict=config_dict).

This property already has quite a bit of testing.

Resolving defines

An ert config file can contain DEFINE A B meaning whenever it encounters A in a filename it should be substituted with B. These defines are collected as {ConfigKeys.DEFINE_KEY: {"A": "B"}} in the config_dict, but calling ResConfig({ConfigKeys.DEFINE_KEY:{"A": "B"}, ConfigKeys.DATA_FILE: "A"}) still thinks the data file is named "A" and not "B". Likely resolving these defines should be a separate step from parsing the file and exactly where it should exist in the pipeline will have to be made consistent with current behavior.

Parsing with fallback

It is quite possible to keep the old parser around as a fallback for as long as it seems necessary:

try:
    config_dict = parse_config_file(user_config_file, config, throw_on_error)
    configs, config_dir = self._alloc_from_dict(
        config_dict=config_dict, throw_on_error=throw_on_error
    )
except ResParsingException, NotImplementedException as e:
    logging.warning("New parser failed with error {}", e)
    configs, config_dir = self._alloc_from_content(
        user_config_file=user_config_file,
        config=config,
        throw_on_error=throw_on_error,
    )

As one necessary part of testing the _alloc_from_dict is comparing the resulting configs list, one can also parse the file twice until the parser is trusted to produce correct results:

config_dict = parse_config_file(user_config_file, config, throw_on_error)
configs, config_dir = self._alloc_from_dict(
    config_dict=config_dict, throw_on_error=throw_on_error
)
configs_legacy, config_dir_legacy = self._alloc_from_content(
    user_config_file=user_config_file,
    config=config,
    throw_on_error=throw_on_error,
)
if configs_legacy != configs or config_dir_legacy != config_dir:
    logging.warning("New parser failed to produce equivalent results")
    configs = configs_legacy
    config_dir_legacy = config_dir

Replace parsing one config object at the time

It is also possible to delegate only some config objects to be parsed in python by injecting the parser in ResConfig._alloc_from_dict. instead of: https://github.com/equinor/ert/blob/2e1e1403ba4c93d202c30e1bfbdde8b839a5598c/res/enkf/res_config.py#L171-L186

one can

    def _alloc_from_content(
        self, user_config_file=None, config=None, throw_on_error=True
    ):
        if user_config_file is not None:
            # initialize configcontent if user_file provided
            parser = ConfigParser()
            config_content = self._alloc_config_content(user_config_file, parser)
            config_dir = config_content.getValue(ConfigKeys.CONFIG_DIRECTORY)
        else:
            config_dir = os.getcwd()
            config_content = self._build_config_content(config)

        if self.errors and throw_on_error:
            raise ValueError("Error loading configuration: " + str(self._errors))
        subst_config = SubstConfig(config_dict=parse_config_file(user_config_file, config, throw_on_error))

eivindjahren avatar Dec 09 '21 14:12 eivindjahren