Implement BoutOptionsFile
We want to fully integrate a class structure storing all input file options.
We can't just use boutdata.BoutOptionsFile though, because it uses boututils.DataFile in places, so it will need at least a partial rewrite to work with xBOUT.
I actually tried to see if there was a way to rewrite BoutOptionsFile completely (aiming for a much shorter, less home-grown implementation), but the unique structure of BOUT.inp files foiled me. BOUT.inp files are similar to .ini files and YAML files, but have distinguishing properties which make handling them with variations of standard python library config classes difficult:
-
Missing initial section header. This is annoying but solvable - you can still read and write
.inpfiles withconfigparser.ConfigParserbut you have to special-case the first line. -
Nested sections. This rules out easily subclassing
configparser.ConfigParser- I thought maybe you could overrideConfigParser.sections()with a recursive lookup method but asConfigParserinherits its structure fromcollections.MutableMappingI couldn't see any easy way to make this work. -
Specifying nesting in header names. The
[section:subsection]syntax isn't shared by any commonly-used config file structure I could find. I think this rules out usingconfigobj.ConfigObj, which is a shame because that one supports nesting (and nested substitution).
While trying out these things I did write some tests though, so we could use them (boutdata.BoutOptionsFile currently has no direct tests).
Therefore I suggest what we should do is:
-
Write some tests defining the expected behaviour of the options file class (I have some already, which I will submit later),
-
Copy over the basic parts of the
boutdata.BoutOptionsFilecode toxBOUT, -
Reimplement the methods which rely on
boututils.DataFile.
This should hopefully allow us to improve the boutdata.BoutOptionsFile code if needed but without breaking backwards compatibility.
Alternatively, if we are okay with breaking backwards compatibility, there are lots of ways in which BoutOptionsFile could be improved...
(By breaking backwards compatibility I don't mean that the BOUT.inp structure would have to be changed, I just mean you wouldn't be able to just drop in xbout.BoutOptionsFile for boutdata.BoutOptionsFile and not have to change the syntax of any calls.)
Here's a list of things which I think an updated version of BoutOptionsFile could feature, or at least should do differently from the current boutdata.BoutOptionsFile:
I/O
-
Should have the option to read grid size data from a
xarray.Datasetobject provided by xBOUT. -
Should use
pathlib.Path()under the hood. (pathlibis so much better thanos, it even gives automatic compatibility with Windows-style paths) -
Make casting to lowercase optional. BOUT in general should not do this in my opinion, in case one of the values in the input file is a path.
-
Ability to read from and write to a nested dictionary format. Potentially useful for making tests which don't have to read or write files, and testing
evalfeatures. -
__str__could have an option to print out as a nested dictionary (potentially by usingprettyprint.PrettyPrinter). This would then be a serializable representation of the object.
Evaluation and substitution
-
A
.get()method with optional arguments for evaluation. The default should be to evaluate the function in the file, then opt-out with.get(key, eval=False). -
Also store in-line comments, and then optionally strip them when fetching with
.get(key, comment=False)(defaulting toTrue). That would allow near-roundtripping: writing theBoutOptionsFileback out to a new.inpfile would then preserve the original inline comments. It would be super-helpful when finding an old dataset for it to still have the comments in the input file. (You might even consider a scheme to keep the lone comments as well as the inline ones...) -
configparserhas classes for this, which could be used as inspiration (they call it "interpolation").
Indexing
-
The
.keys()method returns a counterintuitive result: it returns a list where the first half issectionsand the second half iskeys. It should just return thekeysfor that section. -
__iter__is similar, it first yields the sections, then yields the keys.BoutOptionsfundamentally has a dictionary structure:__iter__should yield(key, section)pairs. -
There should be an
.items(section=None)method, which returns key, value pairs in a section. -
The
getSection()method should not also be a setter - currently if it doesn't find a section in the input file it will create one. This should instead throw aKeyError, and require a newset_section()method. AlsogetSection()->get_section()for PEP8 compliance. -
Better error handling when indexing:
configparserhas lots of more specific errors which could be imported and thrown when appropriate.
Code structure
-
It needs tests, including of evaluation of different types.
-
It should have a
__repr__method, which I think should just returnBoutOptionsFile('/path/to/BOUT.inp') -
.path()is a potentially-confusing method name for an object which also has an attribute storing the path to a file. (Could call it.route()or.branch()?) -
boutdata.BoutOptionsimplements nesting by storing itself, but it would be clearer to have a top-levelBoutOptionsParserclass, which could read from dicts, and is subclassed asBoutOptionsFile. This parser would contain a nested dictionary of instances of aSectionclass.
I'm not saying we should do all this, or do any of it now, but I wanted to write down all the ways I think it could be improved in one place, in case they are helpful in future.
(xref https://github.com/boutproject/BOUT-dev/issues/304 and https://github.com/boutproject/BOUT-dev/pull/1066)
I've just read BOUT's documentation on the input file in detail, and apparently BOUT++ does several weird things when parsing that I was not aware of. These all will have to also be in any xbout.OptionsFile parser otherwise your analysis might not be consistent with your data.
- Detect escaped arithmetic symbols (
+-*/^), dot (.), or brackets ((){}[]) in option names - Detect escaping through backquotes in option names (
`2ndvalue`) - Evaluate
numletterasnum*letter(andnumbracketasnum*bracket) - Substitute unicode representation of pi
- Ensure option names don't contain
:or= - Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think
dx = Lx/nxevaluated to an integer internally... Does BOUT actually do this @johnomotani ??
Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think dx = Lx/nx evaluated to an integer internally... Does BOUT actually do this
I didn't think so... is it only in options that are being read as ints in the C++ code?
I'm not sure I'm understanding the documentation correctly, what it says here is
All expressions are calculated in floating point and then converted to an integer when read inside BOUT++. The conversion is done by rounding to the nearest integer, but throws an error if the floating point value is not within (1e-3) of an integer. This is to minimise unexpected behaviour. If you want to round any result to an integer, use the round function
I think the documentation needs re-phrasing. I believe it means:
- all expressions are calculated as floating point
- if an expression is read into an
int, it is automatically rounded, as long as the value is within1e-3of an integer. If the value is more than1.e-3from an integer, an error is thrown.- but, if you actually want to put in something like
nxpe=nx/42for an integer parameternxpewhere the expression might not give an integer result, you can round it usingnxpe=round(nx/42)to avoid the error.
- but, if you actually want to put in something like
Seems it might be easier to use boutcore :-)
I think this depends on what level you want to be able to reproduce the inputs (and at what point an input become a calculated result). The BOUT.inp is programmable, with a pretty complete parser. This functionality is really useful when doing MMS testing or complex inputs. The file can be read as a pretty standard INI file (with subsections not being standardised). The inputs however would then just be stored as strings if they are expressions. To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work. I think @dschwoerer is probably right that at that point it becomes easier to just use 'boutcore'.
It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations. One could argue that the PhysicsModel code should just save everything necessary to the dump files, and I wouldn't argue with that, but it would be nice to have less code to write... would it make sense to save at least all the scalar options to the output files (which would then give us access to them in evaluated form)?
Edit: I guess the problem would be that the Options objects should not need to know about a Datafile dump global object, or PhysicsModel-specific object ideally, but maybe PhysicsModel could set a dump file for the Options during initialisation?
at that point it becomes easier to just use 'boutcore'.
But you would have to import BOUT's c++ code for that then right? So xBOUT would no longer be a pure python package.
The file can be read as a pretty standard INI file (with subsections not being standardised).
I don't know if it will ever be worth it to change the format (BOUT v5 maybe?), but I think YAML would have been the best choice. It's clear and human-readable, supports nesting, and so parsers could be used out of the box in both python (pyyaml) and C++ (yaml-cpp). You can do nested sections with indentation like this:
mesh:
- nx: 484
- dx: Lx/(nx-4)
ddx:
- 'first': 'C2'
there is a nice example of a .yml file used in a similar way here.
To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work.
I'm not sure it's that bad - string manipulation in python is pretty comprehensive and it's a task very amenable to unit testing.
It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations.
Yes, even just my mesh: dx = Lx / nx example justifies doing this in my opinion.
On 12/20/19 2:09 PM, Tom Nicholas wrote:
at that point it becomes easier to just use 'boutcore'.But you would have to import BOUT's c++ code for that then right? So xBOUT would no longer be a pure python package.
boutcore needs to link against the BOUT++ library. And a C++ compiler
is needed to compile the cython generated boutcore code. But I assume a
C/C++ compiler is anyway needed, as the netcdf python package needs also
a compiler.
If you would rely on boutcore as external package, xBOUT would still be as pure python as it is today.
It will make however installation more difficult, as you not only need a netcdf library (among others) but also the bout++ library for boutcore.
Thanks for the clarification David.
I assume a C/C++ compiler is anyway needed, as the netcdf python package needs also a compiler.
Well the user doesn't have to worry about this if they have installed netcdf-python via conda or some other way.
It will make however installation more difficult, as you not only need a netcdf library (among others) but also the bout++ library for boutcore.
This was what I meant - we don't want people to have to download the bout++ library or install anything beyond just pip install xbout, in order to simply open up their data.
If you would rely on boutcore as external package
It would be nice to include boutcore in xbout, but it should be as an optional dependency.
Almost all the things that are hard about reading a BOUT.inp file are to do with evaluating expressions, which is the point where the escaping and special handling is needed. Reading the INI file is pretty trivial, and I think easier for both humans and machines to read and write than YAML: what I've read about YAML is not generally very positive.
hypnotoad has some different issues with an Options class https://github.com/boutproject/hypnotoad/issues/27 - maybe a solution could cover both use-cases? Possibly a base 'generic options' class that could be used directly by hypnotoad and a subclass with extra more BOUT-specific stuff like evaluating expressions?
I imagine #97 could be made general enough to cover both. That PR is on the backburner for me now though. If someone wants to pick it up the comments and tests should make that easier.
On thinking a bit more, a shared options class between hypnotoad (https://github.com/boutproject/hypnotoad/issues/27) and xBOUT may not be such a good idea:
hypnotoadwants immutable options (i.e. a given list, defined in the class definition), with functionality to set defaults and (possibly) check allowed valuesxBOUTwants to be able to read and make available to the user whatever's in the input file, and doesn't need default values (if anything needed is not present, then it should be an error)