pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

Validate shape of observations in workspace

Open alexander-held opened this issue 1 year ago • 0 comments

Summary

There is currently no validation happening for the shape of the observations per channel when reading a workspace and extracting data from it. Making that part of the schema validation is probably difficult, but I think the pyhf.Workspace.data could benefit from a shape check. Users will otherwise run into an error later when data is used, which might not be very easy to understand.

Additional Information

This example shows how a mis-specified data field in the observations goes through and then subsequently causes an exception

pyhf.exceptions.InvalidPdfData: eval failed as data has len 2 but 1 was expected

which is probably not very easy to understand for non-experts. I ran into this setup while manually editing a workspace for debugging purposes.


import pyhf

spec = {
    "channels": [
        {
            "name": "SR",
            "samples": [
                {
                    "data": [15.0],
                    "modifiers": [
                        {"data": None, "name": "mu", "type": "normfactor"},
                    ],
                    "name": "Signal",
                }
            ],
        }
    ],
    "measurements": [
        {"config": {"parameters": [], "poi": "mu"}, "name": "minimal_example"}
    ],
    "observations": [{"data": [15.0, 20.0], "name": "SR"}],
    "version": "1.0.0",
}

ws = pyhf.Workspace(spec)
model = ws.model()
data = ws.data(model)  # perhaps an error should be raised at this point?
print(data)  # [15.0, 20.0]

pyhf.infer.mle.fit(data, model)  # this fails as the data has the wrong shape

Code of Conduct

  • [X] I agree to follow the Code of Conduct

alexander-held avatar Mar 30 '23 07:03 alexander-held