ert
ert copied to clipboard
Improvement when parsing ert-config
When parsing arguments for forward model jobs or workflow jobs we see some strange behaviour when escaping certain characters:
- spaces works only in double quotes
- commas works only with single quotes
Only one should be necessary
FORWARD_MODEL REPLACE_STRING(<FROM>='some, thing', <TO>="some stuff", <FILE>=file.txt) : success
FORWARD_MODEL REPLACE_STRING(<FROM>='some, thing', <TO>='some stuff', <FILE>=file.txt) : -> some stuff becomes somestuff
FORWARD_MODEL REPLACE_STRING(<FROM>="some, thing", <TO>="some stuff", <FILE>=file.txt) : util abort
I am not sure this is possible to solve without major breaking of current config parsing.
Spaces outside of quotes also behaves oddly. For instance:
-
<OPTIONAL_ARGUMENT>= stuff
will be ignored and -
<FAULTY_ARGUMENT>=part-included part-excluded
will have the second part ignored.
I have added these 5 examples to the test for the C function that parses these strings, and they work fine.
One of the following things could be going on:
- That C function (
subst_list_add_from_string
) is maybe not used anymore. - The string is pre-processed somewhere before calling
subst_list_add_from_string
. - The string is post-processed somewhere after calling
subst_list_add_from_string
.
I will try to figure out which one it is...
It seems to be number 2. subst_list_add_from_string
receives a modified string. Somewhere the arguments are split and then put together again, incorrectly.
It turns out that when the config file is read, the lines are split by space and double quotes ("
) are removed. Later when a FORWARD_MODEL
key is detected, the parts are simply concatenated before further processing. So, in the process double quotes and whitespace are lost.
It will be a bit of a job to fix that while making sure that nothing breaks elsewhere:
The splitting happens in config_parse__
in config_parser.cpp
, using a basic parser from libecl
.
The re-joining happens in forward_model_parse_job_deprecated_args
in forward_model.cpp
.
Some options for fixing:
- Disable the splitting. This will require hunting down all places in the code where a split string is expected and making adjustments. A lot of work with the risk of missing some cases.
- Make the split smarter by only removing double quotes at the beginning and end, and afterwards re-joining with a space. This would have the effect of retaining the input string with all whitespace replaced with a single space. This solution could have unexpected side-effects within other parts of the code.
- ~~Replace all double quotes with single quotes before parsing, same effects as previous point, and assumes that double quotes are equivalent to single quotes.~~
- ~~Disallow double quotes (reporting an error), resulting in the same effect of whitespace being replaced with a single space.~~
- Store the raw string along with the split string and use that with the
FORWARD_MODEL
key instead of the re-joined string. Should be safe, but it is somewhat inelegant to drag along both raw and split strings in the configuration object.
Double quotes are necessary in https://fmu-docs.equinor.com/docs/ert/reference/forward_models.html#ECL2CSV-examples in order to escape the comment characters --
.
Will be fixed as part of rewriting the config parser. This issue should be used as reference to things we need to fix as part of the work.
There is a prototype for a new parser in #4720 where a test test_quotations_in_forward_model_arglist
is added to document the new possible behaviour for this.
This one solved now with the merge of the new parsing @mortalisk ?
This will be solved when the new parser replacing the old one and is used by default. Not actionable issue