cabinetry icon indicating copy to clipboard operation
cabinetry copied to clipboard

Configuration file design

Open alexander-held opened this issue 4 years ago • 0 comments

This issue will collect thoughts and ideas about configuration file design (some of it used to be stored in the readme instead).

Configuration file thoughts

Grouping of options

The configuration file is how analyzers specify their fit model. Experience shows that it can get complex quickly. It is desirable to group configuration settings in ways that can make the file easier to read. For example, the color with which to draw a sample in figures does not matter for the fit model. It should be possible to easily hide such options for easier inspection of the configuration file, and this could be achieved by grouping them together as "cosmetics".

Validation

As much as possible, automatic checks of the configuration file structure and content should happen before running any computationally expensive steps. For example, if input data is declared to be at various different locations, a quick check could verify that indeed data can be found at the paths declared. This can quickly flag typos before any histogram production is run.

Interactions with other existing frameworks

While ambitious, it would be great to be able to translate configurations of other existing frameworks into a cabinetry configuration, to be able to easily run detailed comparisons. Some relevant work for TRExFitter exists here.

Where to specify file paths

Events for a given histogram are located at some path that can be specified by the sample name, region name, and systematic variation. It is unclear how to support as many structures as possible, while limiting the amount of options needed to specify them. See the #16.

Single-element lists

In multiple places in the config, lists of samples, regions, systematics etc. are needed. These could look like this:

"Samples": ["ABC", "DEF"]

For cases where only a single entry is needed, it could either still be written as a single-element list, or alternatively as

"Samples": "ABC"

which turns the value into a string instead. It is desirable to have consistency. During config parsing, everything could be put into a list as needed, or the code further downstream could handle both possible cases. While forcing the user to write everything as a list might result in less aesthetically pleasing results,

"Samples": ["ABC"]

this still might be the best solution overall, as it also prevents other tools using the same config from having to manually implement the parsing of different types of values.

Alternatively, the confuse library from #27 could be used to obtain values in a specific type.

Reserved values for convenience

For a systematic uncertainty affecting all existing samples, it might be convenient to support a setting like "Samples": "ALL". This requires reserving such keywords, no samples could be allowed to have this name.

alexander-held avatar Jul 06 '20 09:07 alexander-held