pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

feat: Configurable constraints per-modifier

Open kratsg opened this issue 3 years ago • 3 comments

Description

I only added functionality for staterror right now, but this lets you define a modifier with a constraint parameter like so

"modifiers": [ 
    { 
        "data": [ 
            0.4581917541908765
        ],
        "name": "staterror_SR0L_Lnj_Imeff_cuts",
        "type": "staterror",
        "constraint": "poisson"
    },
]

which should get picked up correctly. Should also allow for an unconstrained option in the future too (to turn off the constraint).

#760 is kinda related.

Checklist Before Requesting Reviewer

  • [ ] Tests are passing
  • [ ] "WIP" removed from the title of the pull request
  • [ ] Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • [ ] Summarize commit messages into a comprehensive review of the PR

kratsg avatar Mar 30 '22 17:03 kratsg

For debugging purposes, here are likelihood scans:

import numpy as np
import pyhf
import matplotlib as mpl
import matplotlib.pyplot as plt

spec = {
    "channels": [
        {
            "name": "ch",
            "samples": [
                {
                    "data": [1000.0],
                    "modifiers": [
                        {"data": None, "name": "mu_sig", "type": "normfactor"},
                        {"data": [150.0], "name": "staterror_ch", "type": "staterror"},
                    ],
                    "name": "signal",
                }
            ],
        }
    ],
    "measurements": [{"config": {"parameters": [], "poi": "mu_sig"}, "name": "meas"}],
    "observations": [{"data": [1000], "name": "ch"}],
    "version": "1.0.0",
}

model_gaussian = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_gaussian))

spec["channels"][0]["samples"][0]["modifiers"][1].update({"constraint": "poisson"})
model_poisson = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_poisson))

spec["channels"][0]["samples"][0]["modifiers"][1] = {
    "data": [150.0],
    "name": "shapesys",
    "type": "shapesys",
}
model_shapesys = pyhf.Workspace(spec).model()
print(pyhf.Workspace(spec).data(model_shapesys))

par_range = np.linspace(0.1, 1.9, 19)
ll_gaussian = []
ll_poisson = []
ll_shapesys = []

for par in par_range:
    auxdata_stat = pyhf.tensorlib.astensor([1.0])
    auxdata_shape = pyhf.tensorlib.astensor([(1000 / 150) ** 2])
    parameters = pyhf.tensorlib.astensor([1.0, par])
    ll_gaussian.append(model_gaussian.constraint_logpdf(auxdata_stat, parameters))
    ll_poisson.append(model_poisson.constraint_logpdf(auxdata_stat, parameters))
    ll_shapesys.append(model_shapesys.constraint_logpdf(auxdata_shape, parameters))

mpl.style.use("fivethirtyeight")

plt.plot(par_range, ll_gaussian, label="staterror (Gaussian)")
plt.plot(par_range, ll_poisson, label="staterror (Poisson)")
plt.plot(par_range, ll_shapesys, label="shapesys (Poisson)")
plt.legend()
plt.xlabel("parameter value")
plt.ylabel("log likelihood")
plt.tight_layout()
plt.savefig("scan.pdf")

scan

I am unsure about what the auxdata for staterror with Poisson constraint should be, and whether it should change like in the shapesys. I would expect that a Poisson staterror scan matches shapesys, at least up to some constant offset in case there are some normalization differences.

alexander-held avatar Mar 31 '22 14:03 alexander-held

The maximum for the Poisson staterror case in the example above is at 1000/150 = 6.67 (data / staterror), which looks a bit suspicious. Is there something wrong with the test setup? Making the auxdata (1000/150)^2 (like shapesys) is not the solution either, it moves the maximum likelihood over by the same factor.

alexander-held avatar Mar 31 '22 15:03 alexander-held

Looking what actually goes into the final Poisson call in the backend: for a shapesys, the rate is the parameter times auxdata (which itself is (1000/150)^2). For the staterror with Poisson term as currently implemented, the rate is (staterror/yield) times the parameter, which is 150/1000*parameter for the example above. Together with the (currently default) auxdata of 1, that leads to a maximum in that case at 1000/150.

alexander-held avatar Mar 31 '22 15:03 alexander-held