pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

Support asimov data in the HiFa spec

Open kratsg opened this issue 3 years ago • 2 comments

Summary

As it turns out, the HistFactory XML spec allows for <Channel /> with no <Data />. In those cases, it seems that Asimov data is used instead.

We don't currently do this in pyhf. Actually, we haven't run into a spec from a user yet where <Data/> was missing so we never caught the uncovered case that I found in readxml.py as part of #1909 while I'm adding typehints there. Specifically: https://github.com/scikit-hep/pyhf/blob/acde7f4ff8d0db2351f5d6e31ff5584e34da0cf0/src/pyhf/readxml.py#L229-L233 - parsed_data = None can happen which means that when we write out the observations, we will have 'data': None which is not allowed by the HiFa JSON spec right now (but it technically should be).

In the situation when there's no data for a channel, we should probably generate asimov data for it, but this feels very "magical" to me, but maybe functionality users will want, even if it appears that most frameworks (exporting to XML+ROOT) are not using this right now.

Additional Information


Code of Conduct

  • [X] I agree to follow the Code of Conduct

kratsg avatar Jul 02 '22 04:07 kratsg

See also #566 which includes an example. I have seen the <Data> tag being kept in these cases as well, without any path information provided.

When considering the addition of automatic Asimov dataset generation, there needs to be a method for picking the Asimov values of free-floating parameters (assuming that "pre-fit Asimov" would be used, and I think that generally makes sense here since data might not only be missing from some channels, but also from all of them so there is no way to perform a partial fit to actual data first).

alexander-held avatar Jul 02 '22 10:07 alexander-held

For now, current solution is to raise a RuntimeError.

kratsg avatar Aug 10 '22 21:08 kratsg