cabinetry
cabinetry copied to clipboard
Configuration file design
This issue will collect thoughts and ideas about configuration file design (some of it used to be stored in the readme instead).
Configuration file thoughts
Grouping of options
The configuration file is how analyzers specify their fit model. Experience shows that it can get complex quickly. It is desirable to group configuration settings in ways that can make the file easier to read. For example, the color with which to draw a sample in figures does not matter for the fit model. It should be possible to easily hide such options for easier inspection of the configuration file, and this could be achieved by grouping them together as "cosmetics".
Validation
As much as possible, automatic checks of the configuration file structure and content should happen before running any computationally expensive steps. For example, if input data is declared to be at various different locations, a quick check could verify that indeed data can be found at the paths declared. This can quickly flag typos before any histogram production is run.
Interactions with other existing frameworks
While ambitious, it would be great to be able to translate configurations of other existing frameworks into a cabinetry
configuration, to be able to easily run detailed comparisons.
Some relevant work for TRExFitter exists here.
Where to specify file paths
Events for a given histogram are located at some path that can be specified by the sample name, region name, and systematic variation. It is unclear how to support as many structures as possible, while limiting the amount of options needed to specify them. See the #16.
Single-element lists
In multiple places in the config, lists of samples, regions, systematics etc. are needed. These could look like this:
"Samples": ["ABC", "DEF"]
For cases where only a single entry is needed, it could either still be written as a single-element list, or alternatively as
"Samples": "ABC"
which turns the value into a string instead. It is desirable to have consistency. During config parsing, everything could be put into a list as needed, or the code further downstream could handle both possible cases. While forcing the user to write everything as a list might result in less aesthetically pleasing results,
"Samples": ["ABC"]
this still might be the best solution overall, as it also prevents other tools using the same config from having to manually implement the parsing of different types of values.
Alternatively, the confuse library from #27 could be used to obtain values in a specific type.
Reserved values for convenience
For a systematic uncertainty affecting all existing samples, it might be convenient to support a setting like "Samples": "ALL"
.
This requires reserving such keywords, no samples could be allowed to have this name.