pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

feat: Add Cramer-rao uncertainties + covariance using autodiff to non-minuit fits by default

Open phinate opened this issue 1 year ago • 15 comments

Description

One of the main reasons minuit is the default optimizer is that it packages up fit results with uncertainties. We can do that for any fitting procedure using the Fisher information matrix!

This PR adds this behaviour by default for the case that the optimizer isn't minuit. The scipy.optimize.fit_result is packaged up with the uncert info as if it was calculated natively.

(still needs tests!)

Would close #2268.

Checklist Before Requesting Reviewer

  • [x] Tests are passing
  • [x] "WIP" removed from the title of the pull request
  • [ ] Selected an Assignee for the PR to be responsible for the log summary

Before Merging

For the PR Assignees:

  • [ ] Summarize commit messages into a comprehensive review of the PR

phinate avatar Aug 09 '23 11:08 phinate

Right now pyhf.infer.mle.fit(..., return_uncertainties=True) silently ignores that kwarg if the backend isn't autodiff-enabled as far as I can see.

can you point to the specific line(s) here? i can't see that behaviour on first pass

(edit: as Alex said, this is just already in main somewhere, not part of this PR!)

phinate avatar Aug 09 '23 14:08 phinate

Thanks @phinate! This would be a welcome addition I think. I probably won't have time to review this until later this week, but from a quick scan looks like there are still some development things to do and so I've put it in draft mode. There might be enough work here to split things out across maybe 2 PRs to try to keep things more atomic, but we can see.

This would close Issue #2268, correct? If so, can you please note that in the PR body?

matthewfeickert avatar Aug 09 '23 22:08 matthewfeickert

Thanks @phinate! This would be a welcome addition I think. I probably won't have time to review this until later this week, but from a quick scan looks like there are still some development things to do and so I've put it in draft mode. There might be enough work here to split things out across maybe 2 PRs to try to keep things more atomic, but we can see.

This would close Issue #2268, correct? If so, can you please note that in the PR body?

Indeed, sorry I should have made this a draft! Wasn't sure if the draft would trigger the CI or not (was curious if it would pass before local testing).

I've updated the PR with that issue number, thanks for that! That gist was mainly for the errors on yields so i got a bit mixed up, but the title is clear :)

phinate avatar Aug 11 '23 08:08 phinate

Codecov Report

Attention: Patch coverage is 96.87500% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 97.88%. Comparing base (64ab264) to head (d3142fc).

:exclamation: Current head d3142fc differs from pull request most recent head 5d3d3d6. Consider uploading reports for the commit 5d3d3d6 to get more accurate results

Files Patch % Lines
src/pyhf/tensor/numpy_backend.py 60.00% 2 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2269      +/-   ##
==========================================
- Coverage   98.21%   97.88%   -0.33%     
==========================================
  Files          69       69              
  Lines        4543     4593      +50     
  Branches      804      814      +10     
==========================================
+ Hits         4462     4496      +34     
- Misses         48       59      +11     
- Partials       33       38       +5     
Flag Coverage Δ
contrib 97.84% <96.87%> (+0.04%) :arrow_up:
doctest ?
unittests-3.10 ?
unittests-3.11 ?
unittests-3.12 ?
unittests-3.8 ?
unittests-3.9 96.34% <96.87%> (+0.06%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Aug 14 '23 09:08 codecov[bot]

Worked some more on this, and expanded the scope of the uncertainty and correlation tests to encompass non-minuit options.

Importantly, I've addressed an oversight, where the covariance matrix (hess_inv) was not treated correctly during fits with fixed parameters -- it's right to zero out the uncertainties, but then the covariance matrix also needs the relevant row/column zeroed out too (noticed this when calculating correlations). I don't think this was used actively downstream, but it's been fixed for minuit + scipy now.

phinate avatar Aug 14 '23 11:08 phinate

Importantly, I've addressed an oversight, where the covariance matrix (hess_inv) was not treated correctly during fits with fixed parameters -- it's right to zero out the uncertainties, but then the covariance matrix also needs the relevant row/column zeroed out too (noticed this when calculating correlations). I don't think this was used actively downstream, but it's been fixed for minuit + scipy now.

I got worried that this also would probably imply a bug in cabinetry and had a look, but I cannot reproduce what you are describing, or I am misunderstanding this perhaps.

import pyhf

model = pyhf.simplemodels.uncorrelated_background(
    signal=[12.0, 11.0], bkg=[50.0, 52.0], bkg_uncertainty=[3.0, 7.0]
)

data = [62, 63] + model.config.auxdata

pyhf.set_backend("numpy", "minuit")
fix_pars = model.config.suggested_fixed()
fix_pars[2] = True
res, corr, res_obj = pyhf.infer.mle.fit(
    data, model, fixed_params=fix_pars, return_correlations=True, return_result_obj=True
)

print(corr)
print(res_obj.minuit.covariance.correlation())

prints

[[ 1.         -0.26340464  0.        ]
 [-0.26340464  1.          0.        ]
 [ 0.          0.          0.        ]]
┌────────────────────┬──────────────────────────────────────────────────────────┐
│                    │                 mu uncorr_bkguncrt[0] uncorr_bkguncrt[1] │
├────────────────────┼──────────────────────────────────────────────────────────┤
│                 mu │                  1               -0.3                  0 │
│ uncorr_bkguncrt[0] │               -0.3                  1                  0 │
│ uncorr_bkguncrt[1] │                  0                  0                  0 │
└────────────────────┴──────────────────────────────────────────────────────────┘

which correctly zeros out those columns, both from the explicit return pyhf gives me and also in the iminuit data structure. Same holds for the covariance (res_obj.minuit.covariance).

alexander-held avatar Aug 14 '23 11:08 alexander-held

which correctly zeros out those columns, both from the explicit return pyhf gives me and also in the iminuit data structure. Same holds for the covariance (res_obj.minuit.covariance).

ah okay, that's great to know -- have just located the reasons for this after looking through the minuit issue:

https://github.com/scikit-hep/iminuit/issues/762#issuecomment-1207436406

For covariance matrix, where no such restrictions apply, the elements that correspond to fixed parameters are set to zero.

sorry for flagging this with the wrong wording! i was just confused by the fact that the uncertainties need to be manually set to zero within pyhf, but the covariance wasn't edited in-turn (but it is needed for autodiff at least, so has been useful to look at!)

phinate avatar Aug 14 '23 11:08 phinate

I've marked this ready for review, because it now seems to do the correct thing in the cases i've tried, and the tests are passing. I've not added any new tests -- just updated the uncert-using tests in test_optim.py to also include autodiff backends (and no minuit).

This is something I'd like your thoughts on as part of double-checking this:

https://github.com/scikit-hep/pyhf/blob/2787b92eb9d0a92ad0b92ce2961ef3794fe43710/tests/test_optim.py#L425-L428

It seems like for the model that's being tested here, there's quite a big difference in the minuit/autodiff uncertainties (0.6 compared to 0.2). The autodiff result is consistent across backends. @alexander-held made a plot for this pre-fit:

image

Is it important that we keep this model in particular? It seems like we can tweak this a little to make the uncertainties agree more (e.g. by adding more data in the first bin as Alex suggested), but i'm not 100% sure of the scenarios that I'd expect minuit/autodiff uncertainties to stop agreeing. Open to your thoughts here!

phinate avatar Aug 14 '23 13:08 phinate

What's the central value in this case? I'm guessing the issue might be that the lower default POI bound of 0 is close to the best-fit value and therefore the uncertainty estimates don't make too much sense. Easiest way to avoid that would be relaxing the POI bound (or changing the model / data).

alexander-held avatar Aug 14 '23 13:08 alexander-held

What's the central value in this case? I'm guessing the issue might be that the lower default POI bound of 0 is close to the best-fit value and therefore the uncertainty estimates don't make too much sense. Easiest way to avoid that would be relaxing the POI bound (or changing the model / data).

So the test where that line is fixes the POI to 1 in the fit -- even so, seems the bounds are also (-5,5) in suggested_bounds in either case. It's just the histosys that's contributing here it would seem.

phinate avatar Aug 14 '23 15:08 phinate

If it is just the histosys, I am not really sure, but given the big difference you could propagate this through to the post-fit yield uncertainties, they should be comparable to sqrt(N). Given the large difference, that should show which of the numbers is correct.

alexander-held avatar Aug 14 '23 17:08 alexander-held

If it is just the histosys, I am not really sure, but given the big difference you could propagate this through to the post-fit yield uncertainties, they should be comparable to sqrt(N). Given the large difference, that should show which of the numbers is correct.

Doing the fixed POI fit again with mu=1 and propagating:

yields from fit: [127.56220536 184.05513402]
minuit uncert propagated to yields: [ 0.52863025 13.21575613]
autodiff uncert propagated to yields: : [ 9.0680227  13.33975763]
sqrt N: [11.29434395 13.56669208]
fitted nuisance par: both ~= --1.218

Looks like autodiff is much more like sqrt(N), with the difference being in the first bin. Still trying to work out if this kind of severe overconstraint on the NP makes sense given the tight sys band pre-fit. I think this is at least evidence that the autodiff uncerts are not malfunctioning in some way, so good news!

For the sake of scoping, should we raise this in a minuit issue to see if it's expected or not -- i.e. double check what's an artefact of the model or of the optimizer? Then we can patch the tests accordingly here in a later PR if needed, leaving them as-is for now.

(edit: slight update in numbers, accidentally used slightly modified model from original post)

phinate avatar Aug 15 '23 09:08 phinate

How does MINOS compare for the parameter uncertainties?

alexander-held avatar Aug 15 '23 10:08 alexander-held

How does MINOS compare for the parameter uncertainties?

about the same: image

It's worth noting that this difference between the two methods disappears when doing MLE fits instead of fixing the POI in the fit. I think these tests in test_optim.py are mainly to check the validity of having one entry as zero, and the other as whatever it turns out to be in the fit.

phinate avatar Aug 16 '23 12:08 phinate

Just been and wiped the dust of this PR in the Hacktoberfest spirit! Tagging @alexander-held @matthewfeickert when you have a moment.

To recap the behaviour here:

result = pyhf.optimizer.minimize(..., return_uncertainties=True)

will return uncertainties depending on the choice of optimizer:

  • pyhf.optimizer = Minuit: returns uncertainties as computed by minuit.
  • pyhf.optimizer = anything else (e.g. scipy): returns uncertainties computed using the inverse Hessian evaluated at the MLE point, as dictated by the Cramér–Rao bound.

I have not added a keyword argument to force autodiff uncertainties if using minuit, so if that was desired, that's missing right now.

Also, the discussion above has been incorporated into the tests by just allowing minuit and autodiff uncerts to drastically disagree, with a link to this discussion added as a comment. I think we verified that it's a result of either the model or doing a fixed poi fit to eval the uncertainties (differences disappear doing a global fit).

Let me know if there's more I can do with this PR before it's good to go!

phinate avatar Oct 20 '23 15:10 phinate