allow data-less workspaces
Description
some workspaces do not include data (think blinded workspaces) and instead just describe the pdf. These workspaces are still useful as they allow us to do expected limits etc..
Currently the validation requires data
This is related to #514. I have ideas on #514 though :)
Hi, adding some more detail to this, since I was writing it up to open an issue before finding this one.
Workspaces without histograms specified in the Data field in a channel cannot be read by pyhf xml2json. Example:
<Data HistoName="" InputFile="" HistoPath=""/>
Such workspaces are commonly used before unblinding an analysis. It is possible to fit the Asimov dataset with them, since all the required information for that is in the remaining histograms. The code tries to interpret the "" as a path to a histogram, and then fails with a IsADirectoryError via uproot.
Minimal example here: /afs/cern.ch/user/a/alheld/public/pyhf_asimov, run via pyhf xml2json minimal_example/RooStats/minimal_example.xml.
One of the following two options might be good:
-
For this specific case of a missing observation histogram, fill all entries with
-1. This is easy to catch by the user, could be accompanied by a warning, and allows the fit model to be built throughpyhf. This behavior should probably not exist for other histograms that define the model and might be missing, since that would alter the model and could lead to unexpected behavior. -
Catch the issue and issue a more descriptive error message. The user would then need to build a different
.xmlinput to be able to usepyhf. This requires manual intervention by the user, but could at least guide them to the right thing to do, as the error otherwise is not easy to understand.
I just ran into this again and it took me a while to realize what the issue was and remember I had seen it before. The current error is very unintuitive.
What do you think about adding a validation against
https://github.com/scikit-hep/pyhf/blob/2b168b7a04becb0087c7cd614f206df4b6e2d92a/src/pyhf/readxml.py#L278-L280
not being ""? Could just raise a NotImplementedError here with the note that the user ran into this presumably because of lack of data (and optionally point to this issue). This at least makes it clear what is happening. I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).
I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).
Sure that's welcome, though I think we should keep this Issue open until we properly fix this for the next release.