pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

refactor: Use negative log likelihood as objective function

Open matthewfeickert opened this issue 4 years ago • 9 comments

Description

Resolves #1006

Use the negative log likelihood as the objective function to give to the minimizer instead of twice the negative log likelihood, as it is the more canonical choice.

ReadTheDocs build: https://pyhf.readthedocs.io/en/fix-use-nll-as-objective-func/api.html#inference

Checklist Before Requesting Reviewer

  • [x] Tests are passing
  • [x] "WIP" removed from the title of the pull request
  • [x] Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • [x] Summarize commit messages into a comprehensive review of the PR
* Use the negative log likelihood as the objective function to give to the minimizer
   - More standard choice
   - Update iminuit errordef to 0.5
* Update docstrings and tests to reflect the updated results

matthewfeickert avatar Sep 09 '20 00:09 matthewfeickert

Hm. Perhaps concerningly we're seeing differences large than 1e-5 in tests/test_regression.py.

    def test_sbottom_regionA_1300_205_60(
        sbottom_likelihoods_download, get_json_from_tarfile
    ):
        sbottom_regionA_bkgonly_json = get_json_from_tarfile(
            sbottom_likelihoods_download, "RegionA/BkgOnly.json"
        )
        sbottom_regionA_1300_205_60_patch_json = get_json_from_tarfile(
            sbottom_likelihoods_download, "RegionA/patch.sbottom_1300_205_60.json"
        )
        CLs_obs, CLs_exp = calculate_CLs(
            sbottom_regionA_bkgonly_json, sbottom_regionA_1300_205_60_patch_json
        )
>       assert CLs_obs == pytest.approx(0.24443627759085326, rel=1e-5)
E       assert 0.24444183616211698 == 0.24443627759085326 ± 2.4e-06
E        +  where 0.24443627759085326 ± 2.4e-06 = <function approx at 0x7f99299333b0>(0.24443627759085326, rel=1e-05)
E        +    where <function approx at 0x7f99299333b0> = pytest.approx

Not sure if this is just the result of enough floating point subtraction or if there is something else happening.

matthewfeickert avatar Sep 09 '20 01:09 matthewfeickert

Codecov Report

Merging #1057 (30d8181) into master (e3c21aa) will decrease coverage by 1.34%. The diff coverage is 100.00%.

:exclamation: Current head 30d8181 differs from pull request most recent head 4fdc5b7. Consider uploading reports for the commit 4fdc5b7 to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1057      +/-   ##
==========================================
- Coverage   98.12%   96.78%   -1.35%     
==========================================
  Files          64       59       -5     
  Lines        4269     3451     -818     
  Branches      682      499     -183     
==========================================
- Hits         4189     3340     -849     
- Misses         46       68      +22     
- Partials       34       43       +9     
Flag Coverage Δ
contrib ?
doctest ?
unittests 96.78% <100.00%> (+0.60%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pyhf/infer/__init__.py 100.00% <ø> (ø)
src/pyhf/infer/test_statistics.py 100.00% <ø> (ø)
src/pyhf/cli/infer.py 97.53% <100.00%> (-0.09%) :arrow_down:
src/pyhf/infer/mle.py 100.00% <100.00%> (ø)
src/pyhf/optimize/opt_minuit.py 100.00% <100.00%> (ø)
src/pyhf/contrib/viz/brazil.py 18.18% <0.00%> (-81.82%) :arrow_down:
src/pyhf/modifiers/__init__.py 86.44% <0.00%> (-13.56%) :arrow_down:
src/pyhf/events.py 97.14% <0.00%> (-2.86%) :arrow_down:
src/pyhf/tensor/jax_backend.py 95.90% <0.00%> (-2.68%) :arrow_down:
src/pyhf/workspace.py 98.01% <0.00%> (-1.99%) :arrow_down:
... and 54 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 29bc6da...4fdc5b7. Read the comment docs.

codecov[bot] avatar Sep 09 '20 04:09 codecov[bot]

@HDembinski It seems that we're seeing quite substantial changes from iminuit when we simply change the objective function by a factor of 2. Do you have thoughts on how we might better mitigate this with iminuit to make it more stable?

matthewfeickert avatar Sep 14 '20 17:09 matthewfeickert

There shouldn't be a substantial change apart from minor numerical rounding errors. The factor 2 should only affect the uncertainties and even that only if you didn't compensate by setting the errordef. If you change your function and errordef, the effects should almost perfectly cancel, giving the same result.

HDembinski avatar Sep 14 '20 20:09 HDembinski

There shouldn't be a substantial change apart from minor numerical rounding errors. The factor 2 should only affect the uncertainties and even that only if you didn't compensate by setting the errordef. If you change your function and errordef, the effects should almost perfectly cancel, giving the same result.

@HDembinski thanks! I was not aware the errordef played a role here... that said, it now makes some sense.

@matthewfeickert we need to update docs here (https://github.com/scikit-hep/pyhf/blob/cab1d8d4392bb76ddd16553f90ab81a6dfd0494a/src/pyhf/optimize/opt_minuit.py#L27-L29) to state that the default is 0.5 and then change the line here: https://github.com/scikit-hep/pyhf/blob/cab1d8d4392bb76ddd16553f90ab81a6dfd0494a/src/pyhf/optimize/opt_minuit.py#L32

kratsg avatar Sep 15 '20 02:09 kratsg

Note: the accuracy of MINUIT is typically way lower than machine precision. Deviations of 1e-6 can be expected, also see https://iminuit.readthedocs.io/en/stable/benchmark.html.

HDembinski avatar Sep 17 '20 07:09 HDembinski

Note: the accuracy of MINUIT is typically way lower than machine precision. Deviations of 1e-6 can be expected, also see https://iminuit.readthedocs.io/en/stable/benchmark.html.

Interesting, I didn't see this page before! @matthewfeickert we should definitely link to it in our documentation.

kratsg avatar Sep 17 '20 11:09 kratsg

Yes, please do that. However, the benchmark does not tell the whole story. I also did an informal test of a case with limits on parameters and there Minuit performed very well.

HDembinski avatar Sep 29 '20 04:09 HDembinski

Another note: the fact that Minuit stops early, once the accuracy is well below 1 sigma, is an advantage. It is wasteful for a statistical fitter to continue iterating until the parameter values have machine precision.

HDembinski avatar Sep 29 '20 04:09 HDembinski