f90nml icon indicating copy to clipboard operation
f90nml copied to clipboard

Add Parser.patch and Parser.write methods

Open amicitas opened this issue 5 years ago • 4 comments

Currently Parser.read is used to perform three different tasks: reading, patching and writing. This muti-use architecture makes sub-classing or wrapping of the class difficult. Splitting of this functionality into separate methods would improve extendability.

Proposal

Split the functionality in Parser.read into the following methods:

  • Parser.read
  • Parser.patch

Also for completeness:

  • Parser.write

In the process add the following internal methods:

  • Parser._read_nml_file
  • Parser._write_nml_file

Complications

Currently reading, patching and writing are all done inside the token parser logic, which is started in Parser._readstream. There is no easy way to separate the various functions as they are all part of the fundamental logic of f90nml.

Possible Solution

It is possible to fake the separation of read, parse and write by using StringIO objects in the call to _readstream. However, to make this work smoothly requires an intermediate layer in the code.

Advantages

  • User facing API of f90nml.Parser is clear and easy to subclass or wrap.

Disadvantages

  • Extra layer of code.
  • Requires that the namelist file be read into memory. (I can't actually think of any specific case is which this would be a problem, but I wan't to make sure that this is considered.)

Motivation

I am only providing this description since I am proposing a major(ish) change to Parser and thought you might like to see a specific use case.

I happen to be working with a scientific code where input.namelist files are used both for input to a FORTRAN code and for a series of Python based wrappers. This is enabled by using a special comment specifier, e.g. !ss variable = value, which may or may not be inside a proper namelist group.

(I am not defending this design choice, but it is what I have to work with).

By creating a subclass of f90nml I am able to easily read these files by doing the following:

  1. Read the raw namelist file.
  2. Pre-filter the raw file to move all unattached directives into namelist groups.
  3. Pre-filter to rename all the directives to special names (!ss varname -> _specialsauce_varname).
  4. Patch the filtered files using Parser.read, but using StreamIO for input and output.
  5. Post-filter the raw output file to return the special variable names to !ss directives.
  6. Write the patched namelist file.

amicitas avatar Sep 13 '18 16:09 amicitas

As I am sure you can gather from my write up, this is not a bug, but rather an enhancement request.

amicitas avatar Sep 13 '18 20:09 amicitas

I forgot to mention that I have been (slowly) working on a rewrite of the parser: newparser.

That parser is still incomplete, with the core stuff and derived types working, vectors with arbitrary start index are the next hurdle. But it would be good to make sure this is handled in both versions. (This is not your problem, I'm mostly writing this for myself!)

Having said that, I understand your problem a little bit better (handling content within !ss) and I'm not even sure that the current patch method is all that suitable here; it would be even better to have some awareness of these comments. We could, for example, optionally add !ss to the tokenizer and handle such comments separately.

There is some precedent for this with the comment_tokens property of Parser. (e.g. MITgcm supports meta-comments starting with #). It would have to be reworked to track the information, rather than just discard it, but it might be worth tackling it directly.

marshallward avatar Sep 22 '18 23:09 marshallward

I do like the idea of adding the possibility of allowing the specification of ghost variables, !gv = 0.0.

But more importantly it would be nice if the parser class made it easy to do _pre_prossesing_ just before reading and _post_prossesing_ just before writeing with parsing in the middle.

amicitas avatar Sep 23 '18 16:09 amicitas

I agree, it would be good to pre-format the file before handling the parsing. The earlier versions of this module could handle a stream and work with each byte as it came in, but this was often very awkward and caused a lot of the complexity in the current _parse_variable function as more edge cases were revealed (and which motivated the rewrite in newparser).

Currently, we are not even doing this anymore, since the tokenizer (which ought to still be capable of handling streams) just dumps all of the tokens into f90lex which is arbitrarily converted into an iterator.

There are grander goals of returning to this streamable method with the tokenizer acting as a true iterator, but I don't know if it was ever really a worthwhile goal, and it has made it harder to do some of the tasks that you need here.

Sorry for the very long answer; this is helping me to rethink some of the original design assumptions, and that I am open to introduce some of the changes that you are suggesting.


BTW there could also be a place for introducing new custom tokens like !ss or !gv, but we can have that conversation another time.

marshallward avatar Sep 24 '18 00:09 marshallward