feat: Configurable constraints per-modifier
Description
I only added functionality for staterror right now, but this lets you define a modifier with a constraint parameter like so
"modifiers": [
{
"data": [
0.4581917541908765
],
"name": "staterror_SR0L_Lnj_Imeff_cuts",
"type": "staterror",
"constraint": "poisson"
},
]
which should get picked up correctly. Should also allow for an unconstrained option in the future too (to turn off the constraint).
#760 is kinda related.
Checklist Before Requesting Reviewer
- [ ] Tests are passing
- [ ] "WIP" removed from the title of the pull request
- [ ] Selected an Assignee for the PR to be responsible for the log summary
Before Merging
For the PR Assignees:
- [ ] Summarize commit messages into a comprehensive review of the PR
For debugging purposes, here are likelihood scans:
import numpy as np
import pyhf
import matplotlib as mpl
import matplotlib.pyplot as plt
spec = {
"channels": [
{
"name": "ch",
"samples": [
{
"data": [1000.0],
"modifiers": [
{"data": None, "name": "mu_sig", "type": "normfactor"},
{"data": [150.0], "name": "staterror_ch", "type": "staterror"},
],
"name": "signal",
}
],
}
],
"measurements": [{"config": {"parameters": [], "poi": "mu_sig"}, "name": "meas"}],
"observations": [{"data": [1000], "name": "ch"}],
"version": "1.0.0",
}
model_gaussian = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_gaussian))
spec["channels"][0]["samples"][0]["modifiers"][1].update({"constraint": "poisson"})
model_poisson = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_poisson))
spec["channels"][0]["samples"][0]["modifiers"][1] = {
"data": [150.0],
"name": "shapesys",
"type": "shapesys",
}
model_shapesys = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_shapesys))
par_range = np.linspace(0.1, 1.9, 19)
ll_gaussian = []
ll_poisson = []
ll_shapesys = []
for par in par_range:
auxdata_stat = pyhf.tensorlib.astensor([1.0])
auxdata_shape = pyhf.tensorlib.astensor([(1000 / 150) ** 2])
parameters = pyhf.tensorlib.astensor([1.0, par])
ll_gaussian.append(model_gaussian.constraint_logpdf(auxdata_stat, parameters))
ll_poisson.append(model_poisson.constraint_logpdf(auxdata_stat, parameters))
ll_shapesys.append(model_shapesys.constraint_logpdf(auxdata_shape, parameters))
mpl.style.use("fivethirtyeight")
plt.plot(par_range, ll_gaussian, label="staterror (Gaussian)")
plt.plot(par_range, ll_poisson, label="staterror (Poisson)")
plt.plot(par_range, ll_shapesys, label="shapesys (Poisson)")
plt.legend()
plt.xlabel("parameter value")
plt.ylabel("log likelihood")
plt.tight_layout()
plt.savefig("scan.pdf")

I am unsure about what the auxdata for staterror with Poisson constraint should be, and whether it should change like in the shapesys. I would expect that a Poisson staterror scan matches shapesys, at least up to some constant offset in case there are some normalization differences.
The maximum for the Poisson staterror case in the example above is at 1000/150 = 6.67 (data / staterror), which looks a bit suspicious. Is there something wrong with the test setup? Making the auxdata (1000/150)^2 (like shapesys) is not the solution either, it moves the maximum likelihood over by the same factor.
Looking what actually goes into the final Poisson call in the backend: for a shapesys, the rate is the parameter times auxdata (which itself is (1000/150)^2).
For the staterror with Poisson term as currently implemented, the rate is (staterror/yield) times the parameter, which is 150/1000*parameter for the example above. Together with the (currently default) auxdata of 1, that leads to a maximum in that case at 1000/150.