pyhf
pyhf copied to clipboard
refactor: Use negative log likelihood as objective function
Description
Resolves #1006
Use the negative log likelihood as the objective function to give to the minimizer instead of twice the negative log likelihood, as it is the more canonical choice.
ReadTheDocs build: https://pyhf.readthedocs.io/en/fix-use-nll-as-objective-func/api.html#inference
Checklist Before Requesting Reviewer
- [x] Tests are passing
- [x] "WIP" removed from the title of the pull request
- [x] Selected an Assignee for the PR to be responsible for the log summary
Before Merging
For the PR Assignees:
- [x] Summarize commit messages into a comprehensive review of the PR
* Use the negative log likelihood as the objective function to give to the minimizer
- More standard choice
- Update iminuit errordef to 0.5
* Update docstrings and tests to reflect the updated results
Hm. Perhaps concerningly we're seeing differences large than 1e-5
in tests/test_regression.py
.
def test_sbottom_regionA_1300_205_60(
sbottom_likelihoods_download, get_json_from_tarfile
):
sbottom_regionA_bkgonly_json = get_json_from_tarfile(
sbottom_likelihoods_download, "RegionA/BkgOnly.json"
)
sbottom_regionA_1300_205_60_patch_json = get_json_from_tarfile(
sbottom_likelihoods_download, "RegionA/patch.sbottom_1300_205_60.json"
)
CLs_obs, CLs_exp = calculate_CLs(
sbottom_regionA_bkgonly_json, sbottom_regionA_1300_205_60_patch_json
)
> assert CLs_obs == pytest.approx(0.24443627759085326, rel=1e-5)
E assert 0.24444183616211698 == 0.24443627759085326 ± 2.4e-06
E + where 0.24443627759085326 ± 2.4e-06 = <function approx at 0x7f99299333b0>(0.24443627759085326, rel=1e-05)
E + where <function approx at 0x7f99299333b0> = pytest.approx
Not sure if this is just the result of enough floating point subtraction or if there is something else happening.
Codecov Report
Merging #1057 (30d8181) into master (e3c21aa) will decrease coverage by
1.34%
. The diff coverage is100.00%
.
:exclamation: Current head 30d8181 differs from pull request most recent head 4fdc5b7. Consider uploading reports for the commit 4fdc5b7 to get more accurate results
@@ Coverage Diff @@
## master #1057 +/- ##
==========================================
- Coverage 98.12% 96.78% -1.35%
==========================================
Files 64 59 -5
Lines 4269 3451 -818
Branches 682 499 -183
==========================================
- Hits 4189 3340 -849
- Misses 46 68 +22
- Partials 34 43 +9
Flag | Coverage Δ | |
---|---|---|
contrib | ? |
|
doctest | ? |
|
unittests | 96.78% <100.00%> (+0.60%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/pyhf/infer/__init__.py | 100.00% <ø> (ø) |
|
src/pyhf/infer/test_statistics.py | 100.00% <ø> (ø) |
|
src/pyhf/cli/infer.py | 97.53% <100.00%> (-0.09%) |
:arrow_down: |
src/pyhf/infer/mle.py | 100.00% <100.00%> (ø) |
|
src/pyhf/optimize/opt_minuit.py | 100.00% <100.00%> (ø) |
|
src/pyhf/contrib/viz/brazil.py | 18.18% <0.00%> (-81.82%) |
:arrow_down: |
src/pyhf/modifiers/__init__.py | 86.44% <0.00%> (-13.56%) |
:arrow_down: |
src/pyhf/events.py | 97.14% <0.00%> (-2.86%) |
:arrow_down: |
src/pyhf/tensor/jax_backend.py | 95.90% <0.00%> (-2.68%) |
:arrow_down: |
src/pyhf/workspace.py | 98.01% <0.00%> (-1.99%) |
:arrow_down: |
... and 54 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 29bc6da...4fdc5b7. Read the comment docs.
@HDembinski It seems that we're seeing quite substantial changes from iminuit
when we simply change the objective function by a factor of 2
. Do you have thoughts on how we might better mitigate this with iminuit
to make it more stable?
There shouldn't be a substantial change apart from minor numerical rounding errors. The factor 2 should only affect the uncertainties and even that only if you didn't compensate by setting the errordef. If you change your function and errordef, the effects should almost perfectly cancel, giving the same result.
There shouldn't be a substantial change apart from minor numerical rounding errors. The factor 2 should only affect the uncertainties and even that only if you didn't compensate by setting the errordef. If you change your function and errordef, the effects should almost perfectly cancel, giving the same result.
@HDembinski thanks! I was not aware the errordef
played a role here... that said, it now makes some sense.
@matthewfeickert we need to update docs here (https://github.com/scikit-hep/pyhf/blob/cab1d8d4392bb76ddd16553f90ab81a6dfd0494a/src/pyhf/optimize/opt_minuit.py#L27-L29) to state that the default is 0.5
and then change the line here: https://github.com/scikit-hep/pyhf/blob/cab1d8d4392bb76ddd16553f90ab81a6dfd0494a/src/pyhf/optimize/opt_minuit.py#L32
Note: the accuracy of MINUIT is typically way lower than machine precision. Deviations of 1e-6 can be expected, also see https://iminuit.readthedocs.io/en/stable/benchmark.html.
Note: the accuracy of MINUIT is typically way lower than machine precision. Deviations of 1e-6 can be expected, also see https://iminuit.readthedocs.io/en/stable/benchmark.html.
Interesting, I didn't see this page before! @matthewfeickert we should definitely link to it in our documentation.
Yes, please do that. However, the benchmark does not tell the whole story. I also did an informal test of a case with limits on parameters and there Minuit performed very well.
Another note: the fact that Minuit stops early, once the accuracy is well below 1 sigma, is an advantage. It is wasteful for a statistical fitter to continue iterating until the parameter values have machine precision.