pyhf pyhf.infer.intervals.upper_limits.upper_limit ignores the set confidence level when scan=None

Summary

The method pyhf.infer.intervals.upper_limits.upper_limit ignores the set confidence level via the level parameter if scan=None, but it always computes the upper limits at a 95% CL (level=0.05). This happens with pyhf 0.7.6 and very likely also with older versions (though I did not check them), and the bug is present in the main branch as well.

The reason is simple: here https://github.com/scikit-hep/pyhf/blob/main/src/pyhf/infer/intervals/upper_limits.py#L260 we don't pass the parameter level to the function toms748_scan. Passing **hypotest_kwargs is not enough for passing also level.

The effect is that the upper limits with scan=None (aka using the toms748 algorithm) are always computed using the default value level=0.05 set here https://github.com/scikit-hep/pyhf/blob/main/src/pyhf/infer/intervals/upper_limits.py#L26.

The following patch is enough to fix the main branch:

diff --git a/src/pyhf/infer/intervals/upper_limits.py b/src/pyhf/infer/intervals/upper_limits.py
index 6b86d58..047214f 100644
--- a/src/pyhf/infer/intervals/upper_limits.py
+++ b/src/pyhf/infer/intervals/upper_limits.py
@@ -262,6 +262,7 @@ def upper_limit(
         model,
         bounds[0],
         bounds[1],
+        level,
         from_upper_limit_fn=True,
         **hypotest_kwargs,
     )

OS / Environment

Irrelevant

Steps to Reproduce

Using pyhf 0.7.6:

import numpy as np
import pyhf


model = pyhf.simplemodels.uncorrelated_background(
    signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)
observations = [51, 48]
data = pyhf.tensorlib.astensor(observations + model.config.auxdata)
scan = np.linspace(0, 5, 21)

# 95% C.L.
obs_limit, exp_limits, _ = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=None, return_results=True
)
print('toms748, 95%CL', obs_limit, exp_limits)
obs_limit, exp_limits, _ = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=scan, return_results=True
)
print('linear grid, 95%CL', obs_limit, exp_limits)

# 90% C.L.
obs_limit, exp_limits, _ = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=None, level=0.1, return_results=True
)
print('toms748, 90%CL', obs_limit, exp_limits)
obs_limit, exp_limits, _ = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=scan, level=0.1, return_results=True
)
print('linear grid, 90%CL', obs_limit, exp_limits)

File Upload (optional)

No response

Expected Results

Getting the 90% CL upper limits when scan=None as requested.

Actual Results

toms748, 95%CL 1.0115693894583466 [array(0.55988001), array(0.75702336), array(1.06234693), array(1.50116924), array(2.05078782)]
linear grid, 95%CL 1.0176408862843853 [array(0.59576921), array(0.76169166), array(1.08504773), array(1.50170482), array(2.06654952)]
toms748, 90%CL 1.0115693894583466 [array(0.55988001), array(0.75702336), array(1.06234693), array(1.50116924), array(2.05078782)]
linear grid, 90%CL 0.8635353695838466 [array(0.46812832), array(0.64311021), array(0.90865319), array(1.3133953), array(1.85105345)]

pyhf Version

pyhf, version 0.7.6

Code of Conduct

[X] I agree to follow the Code of Conduct

Dec 10 '24 22:12 GiacomoXT

Thanks for the bug report and diagnosis, @GiacomoXT. We'll want to go through and check to see if there is a historical reason for this that might also require an update to any of the lower bounds of our requirements.

I think all of the pyhf team is either currently traveling for NeurIPS 2024, or hit with some deadlines so this might not get looked at sufficiently until later this week. (You might notice that we're majorly behind on some CI health maintenance as well.)

Dec 10 '24 22:12 matthewfeickert

Thanks @matthewfeickert for the quick feedback! And don't worry, take your time. ;)

Dec 10 '24 22:12 GiacomoXT

No, this is just an oversight from #1274. This is a very nice catch. You'll notice this isn't as intended given that we do pass the level through in successive calls anyhow.

Dec 11 '24 01:12 kratsg