Wrong `n_dof` for fixed unconstrained parameters in `_goodness_of_fit`
Hey,
I think I found a small bug in the _goodness_of_fit function, when working with fixed unconstrained parameters. When you fix an unconstrained parameter in the fit using the fix_pars argument, the n_dof is still calculated with all (fixed or free) unconstrained parameters of the model.
Code to reproduce:
import cabinetry
import pyhf
import logging
logging.basicConfig(level=logging.DEBUG)
spec = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": [5.0, 10.0, 15.0],
"modifiers": [
{
"name": "mu",
"type": "normfactor",
"data": None
}
]
},
{
"name": "background",
"data": [50.0, 60.0, 70.0],
"modifiers": [
{
"name": "mu_bkg",
"type": "normfactor",
"data": None
}
]
}
]
}
]
}
model = pyhf.Model(spec)
data = [55.0, 70.0, 70.0]
fixed = model.config.suggested_fixed()
fixed[model.config.par_map['mu_bkg']['slice']] = [True]
fit = cabinetry.fit.fit(model, data, fix_pars=fixed, goodness_of_fit=True)
print(fit.goodness_of_fit)
model.config.par_map['mu_bkg']['paramset'].suggested_fixed = [True]
fit = cabinetry.fit.fit(model, data, fix_pars=fixed, goodness_of_fit=True)
print(fit.goodness_of_fit)
Thanks a lot for raising this! I see that model_utils.unconstrained_parameter_count already has a check for parameters set to be fixed in the model itself but that cannot catch these cases of parameters that are dynamically set to be fixed. I need to think a bit about how to best handle this. Perhaps we can calculate the difference of the sum of True values in model.config.suggested_fixed() and the fix_pars within fit.fit(), which should come out to be this difference that needs to be subtracted from n_dof.
How would we deal with fixed constrained parameters then?
Would you rather correct for this in the fit() function directly, or pass fix_pars through to model_utils.unconstrained_parameter_count as an optional argument? For the latter, one could implement a simple correction, having access to fix_pars.
Good point, sounds like we should pass the information through to handle constrained parameters correctly.
What do you think of this fix?
def unconstrained_parameter_count(model: pyhf.pdf.Model, fix_pars: Optional[List[bool]] = None,) -> int:
"""Returns the number of unconstrained, non-constant parameters in a model.
The number is the sum of all independent parameters in a fit. A shapefactor that
affects multiple bins enters the count once for each independent bin. Parameters
that are set to constant are not included in the count.
Args:
model (pyhf.pdf.Model): model to count parameters for
fix_pars (Optional[List[bool]], optional): list of booleans specifying which
parameters are held constant, defaults to None (use ``pyhf`` suggestion)
Returns:
int: number of unconstrained parameters
"""
n_pars = 0
# custom fixed parameters overrule suggested fixed parameters
if fix_pars is None:
fix_pars = model.config.suggested_fixed()
for parname in model.config.par_order:
if not model.config.param_set(parname).constrained:
# only consider non-constant parameters
par_slice = model.config.par_slice(parname)
n_pars += fix_pars[par_slice].count(False)
return n_pars
Can PR this, if you are happy.
Thanks for opening the PR! Looking at the original implementation, I was trying to remember why I wrote it in that specific way. I cannot think of any obvious issues with what you have though. I'll have a closer look at the PR.
No worries, let me know if you want me to change anything.
Resolved by #479, thanks!