ert
ert copied to clipboard
Replace c config reading with python
The current entry point for getting EnkfMain
is:
from res.enkf import EnkfMain, ResConfig
enkf_main = EnkfMain(ResConfig("my_config.ert"))
However, there is an alternative path already in place:
from res.enkf import EnkfMain, ResConfig
config = dict(...)
enkf_main = EnkfMain(ResConfig(config_dict=config))
which allocates ResConfig
from a dictionary of the values in the config. The file reading in c
is for various reasons both complex, not isolated, and verbose. Also the existing parser will call util_abort
on syntax errors. Therefore we could change ResConfig
so that the config file is parsed on the python side and then continues along the config_dict
code-path later.
Instead of https://github.com/equinor/ert/blob/be5cb726ff67e73a579a5c45806d6b3330e517c5/res/enkf/res_config.py#L105-L124
we could have
def __init__(
self, user_config_file=None, config=None, throw_on_error=True, config_dict=None
):
self._assert_input(user_config_file, config, throw_on_error, config_dict)
self._errors, self._failed_keys = None, None
if user_config_file is not None and config is not None:
config_dict = parse_config_file(user_config_file, config, throw_on_error)
configs, config_dir = self._alloc_from_dict(
config_dict=config_dict, throw_on_error=throw_on_error
)
where parse_config_file
is implemented in python.
This could be done in such a way that the parsing is isolated from the rest of the code. The parser would be easily inspectable as it results in a dictionary. Also, if one wanted to, one could confidently add any other config format as an option:
with open("my_config.yaml", "r") as fh:
config_dict = yaml.safe_load(fh)
enkf_main = EnkfMain(ResConfig(config_dict=config_dict))
as the object is eventually instantiated from that dictionary anyways.
Instantiating from config_dict
is known to be buggy, so that will have to be fixed first.
Tasks
The following tasks were proposed with strangler pattern and test driven development in mind, as with most things in life, discretion is advised.
- [ ] #3847
- [ ] Test and debug the construction via config dict
- [ ] Make functionality that resolves defines in config dict
- [ ] Implement parsing (one keyword at the time) with fallback on the existing parser.
- [ ] Remove legacy parser and disable c tests which rely on reading configs from c.
- [ ] Move (one-by-one) previously disabled c tests either to python or rewrite to avoid file reading.
Blocking bugs
- [x] #2553
- [x] #2554
- [x] #2556
- [x] #2560
- [x] #2561
- [ ] #2567
- [x] #2571
- [x] #3801
- [x] #3802
Default values
The design of the config_dict
initialization was such that all values should be explicit. Therefore
-
ErtWorkflowList
will not add default/system workflows when initializing from dict -
SiteConfig
will not add default/system jobs when initializing from dict -
QueueConfig
will not point toert/bin/job_dispatch.py
ifUSER_MODE=False
and initializing from dict
Therefore it is necessary to add a step to handle default values. The resulting process then goes parse->add system values->initialize config
.
Some design concerns with config_dict
-
EclConfig
config dict takes date string of the form "1960/12/31" in theEND_DATE
key and parses it to a CTime. That complicates error-handling, hampers type checking and makes it difficult to give a good error message on malformed dates. It should be adatetime.date
. -
LogConfig
config dict takes ares.util.enums.MessageLevelEnum
in theLOG_LEVEL
key. The names of these values are unfortunately e.g.LOG_CRITICAL
etc. so to convert the string in the file (name="CRITICAL"
) to an enum is done byMessageLevelEnum["LOG_" + name]
. Also there is overlap with the levels of the librarylogging
(e.g.logging.CRITICAL
). It might simplify things if the config dict contains enum from the librarylogging
andMessageLevelEnum
is only used internally inLoggingConfig
. A similar concern might be raised withQueueConfig
which takesQueueDriverEnum
inConfigKeys.QUEUE_SYSTEM
.
Testing the existing construction via config dict (_alloc_from_dict
)
In order for the approach to work the config dict needs to produce an equivalent ResConfig
object as reading from a corresponding config file, and so this is the most interesting property to test.
The first part would be to produce, given a config_dict
, the corresponding config_file_name = config_dict_to_file(config_dict)
for which parsing should be inverse: config_dict == parse_config_file(confict_dict_to_file(config_dict))
. Then its simply a manner of testing that ResConfig(config_dict_to_file(config_dict)) == ResConfig(config_dict=config_dict)
.
This property already has quite a bit of testing.
Resolving defines
An ert config file can contain DEFINE A B
meaning whenever it encounters A
in a filename it should be substituted with B
. These defines are collected as {ConfigKeys.DEFINE_KEY: {"A": "B"}}
in the config_dict, but calling ResConfig({ConfigKeys.DEFINE_KEY:{"A": "B"}, ConfigKeys.DATA_FILE: "A"})
still thinks the data file is named "A" and not "B". Likely resolving these defines should be a separate step from parsing the file and exactly where it should exist in the pipeline will have to be made consistent with current behavior.
Parsing with fallback
It is quite possible to keep the old parser around as a fallback for as long as it seems necessary:
try:
config_dict = parse_config_file(user_config_file, config, throw_on_error)
configs, config_dir = self._alloc_from_dict(
config_dict=config_dict, throw_on_error=throw_on_error
)
except ResParsingException, NotImplementedException as e:
logging.warning("New parser failed with error {}", e)
configs, config_dir = self._alloc_from_content(
user_config_file=user_config_file,
config=config,
throw_on_error=throw_on_error,
)
As one necessary part of testing the _alloc_from_dict
is comparing the resulting configs
list, one can also parse the file twice until the parser is trusted to produce correct results:
config_dict = parse_config_file(user_config_file, config, throw_on_error)
configs, config_dir = self._alloc_from_dict(
config_dict=config_dict, throw_on_error=throw_on_error
)
configs_legacy, config_dir_legacy = self._alloc_from_content(
user_config_file=user_config_file,
config=config,
throw_on_error=throw_on_error,
)
if configs_legacy != configs or config_dir_legacy != config_dir:
logging.warning("New parser failed to produce equivalent results")
configs = configs_legacy
config_dir_legacy = config_dir
Replace parsing one config object at the time
It is also possible to delegate only some config objects to be parsed in python by injecting the parser in ResConfig._alloc_from_dict
. instead of: https://github.com/equinor/ert/blob/2e1e1403ba4c93d202c30e1bfbdde8b839a5598c/res/enkf/res_config.py#L171-L186
one can
def _alloc_from_content(
self, user_config_file=None, config=None, throw_on_error=True
):
if user_config_file is not None:
# initialize configcontent if user_file provided
parser = ConfigParser()
config_content = self._alloc_config_content(user_config_file, parser)
config_dir = config_content.getValue(ConfigKeys.CONFIG_DIRECTORY)
else:
config_dir = os.getcwd()
config_content = self._build_config_content(config)
if self.errors and throw_on_error:
raise ValueError("Error loading configuration: " + str(self._errors))
subst_config = SubstConfig(config_dict=parse_config_file(user_config_file, config, throw_on_error))