pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

allow data-less workspaces

Open lukasheinrich opened this issue 6 years ago • 4 comments

Description

some workspaces do not include data (think blinded workspaces) and instead just describe the pdf. These workspaces are still useful as they allow us to do expected limits etc..

Currently the validation requires data

lukasheinrich avatar Sep 17 '19 09:09 lukasheinrich

This is related to #514. I have ideas on #514 though :)

kratsg avatar Sep 17 '19 18:09 kratsg

Hi, adding some more detail to this, since I was writing it up to open an issue before finding this one.

Workspaces without histograms specified in the Data field in a channel cannot be read by pyhf xml2json. Example:

<Data HistoName="" InputFile="" HistoPath=""/>

Such workspaces are commonly used before unblinding an analysis. It is possible to fit the Asimov dataset with them, since all the required information for that is in the remaining histograms. The code tries to interpret the "" as a path to a histogram, and then fails with a IsADirectoryError via uproot.

Minimal example here: /afs/cern.ch/user/a/alheld/public/pyhf_asimov, run via pyhf xml2json minimal_example/RooStats/minimal_example.xml.

One of the following two options might be good:

  • For this specific case of a missing observation histogram, fill all entries with -1. This is easy to catch by the user, could be accompanied by a warning, and allows the fit model to be built through pyhf. This behavior should probably not exist for other histograms that define the model and might be missing, since that would alter the model and could lead to unexpected behavior.

  • Catch the issue and issue a more descriptive error message. The user would then need to build a different .xml input to be able to use pyhf. This requires manual intervention by the user, but could at least guide them to the right thing to do, as the error otherwise is not easy to understand.

alexander-held avatar Sep 29 '20 09:09 alexander-held

I just ran into this again and it took me a while to realize what the issue was and remember I had seen it before. The current error is very unintuitive.

What do you think about adding a validation against

https://github.com/scikit-hep/pyhf/blob/2b168b7a04becb0087c7cd614f206df4b6e2d92a/src/pyhf/readxml.py#L278-L280

not being ""? Could just raise a NotImplementedError here with the note that the user ran into this presumably because of lack of data (and optionally point to this issue). This at least makes it clear what is happening. I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).

alexander-held avatar Feb 07 '23 14:02 alexander-held

I'm happy to prepare a small PR if you are ok with that intermediate solution (eventually that should be replaced by handling the case automatically without exception).

Sure that's welcome, though I think we should keep this Issue open until we properly fix this for the next release.

matthewfeickert avatar Feb 07 '23 14:02 matthewfeickert