Large cls value difference between v0.6.3 and master
Summary
When I try to run
$ pyhf cls --test-poi 0.16 signal.json
(file provided to developers) with pyhf, the result depends on the version. For versions before https://github.com/scikit-hep/pyhf/pull/1639, the CLS_exp is about 50% lower than for versions on or after that commit. Testing revealed that this is likely due to the handling of staterrors, as pruning these removed the discrepancy.
OS / Environment
cat /etc/os-release
NAME="Scientific Linux"
VERSION="7.9 (Nitrogen)"
ID="scientific"
ID_LIKE="rhel centos fedora"
VERSION_ID="7.9"
PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA"
HOME_URL="http://www.scientificlinux.org//"
BUG_REPORT_URL="mailto:[email protected]"
REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.9
REDHAT_SUPPORT_PRODUCT="Scientific Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
Steps to Reproduce
pip install --ignore-installed -U "git+https://github.com/scikit-hep/pyhf.git@9fbbbf9740175fcdfe73fedc4351953be18d2664"
pyhf cls --test-poi 0.16 signal.json
differs from
pip install --ignore-installed -U "git+https://github.com/scikit-hep/pyhf.git@5ea4e0a7751aa446c286b9697e01a9439622ebb5
pyhf cls --test-poi 0.16 signal.json
File Upload (optional)
pyhf devs sent internal file.
Expected Results
I expect the behavior to only change slightly between pyhf versions.
Actual Results
The behavior changes up to 50%.
pyhf Version
pyhf, version 0.6.4.dev33
pyhf, version 0.6.4.dev34
Code of Conduct
- [X] I agree to follow the Code of Conduct
For this we're comparing to v0.6.3 and the current state of master.
@kratsg has been able to identify the issue with with staterror as pruning modifier types of staterror from the workspace in question removes the difference:
- commit a8d34462:
$ pyhf prune -t staterror ws.json | pyhf cls --test-poi 0.16 --backend torch
{
"CLs_exp": [
0.0011805804778050168,
0.0074845548029529455,
0.04146623165427494,
0.17762055416827052,
0.4957891760138955
],
"CLs_obs": 0.5105860394966024
}
$ pyhf prune -t staterror ws.json | pyhf cls --test-poi 0.16 --backend torch
{
"CLs_exp": [
0.0011805782059676592,
0.007484543587818926,
0.04146618658379227,
0.17762042937331468,
0.49578899186050396
],
"CLs_obs": 0.5105860277468133
}
and @lukasheinrich has been able to isolate the difference to these lines
https://github.com/scikit-hep/pyhf/blob/b798ef23c77db6b1d6d606f4b360a5f47fd63dd2/src/pyhf/modifiers/staterror.py#L116-L117
added in PR #1639. We'll work on resolving this soon.
Is it know which behavior is correct? If the old is known to be fine, I can just run with that version.
Is it know which behavior is correct? If the old is known to be fine, I can just run with that version.
I can't say with certainty yet until we finish diagnosing things, but if I had to guess the v0.6.3 behavior is correct.
Hi @mdhank - the prior behavoir is correct and we know where the bug is, just requires implementing a fix. I'll update here once we have one
Given the comments above, I guess the source is already clear, but just in case here is a simpler reproducer:
import pyhf
spec = {
"channels": [
{
"name": "channel",
"samples": [
{"data": [5.0], "modifiers": [], "name": "signal"},
{
"data": [1.0],
"modifiers": [
{
"data": [1.5], # value irrelevant
"name": "NP",
"type": "staterror",
},
],
"name": "bkg",
},
],
}
],
"measurements": [
{
"config": {
"parameters": [],
"poi": "",
},
"name": "meas",
}
],
"observations": [{"data": [0], "name": "channel"}],
"version": "1.0.0",
}
inits = [2.0] # staterror / shapesys value
# staterror version
model = pyhf.Workspace(spec).model()
print(f"staterror: {model.expected_data(inits, include_auxdata=False)}")
# shapesys version
spec["channels"][0]["samples"][1]["modifiers"][0]["type"] = "shapesys"
model = pyhf.Workspace(spec).model()
print(f"shapesys: {model.expected_data(inits, include_auxdata=False)}")
which with 9fbbbf97 will print
staterror: [12.]
shapesys: [7.]
and with 5ea4e0a7 will print
staterror: [7.]
shapesys: [7.]
Both modifier types are multiplicative, so the results should match. After the refactor in #1639, the staterror modifier mistakenly acts on both samples.
Hi @mdhank - the prior behavoir is correct and we know where the bug is, just requires implementing a fix. I'll update here once we have one
The issue is understood to be localized around this line: https://github.com/scikit-hep/pyhf/blob/b798ef23c77db6b1d6d606f4b360a5f47fd63dd2/src/pyhf/modifiers/staterror.py#L116-L117