pyhf
pyhf copied to clipboard
Add support for multiple parameters of interest
Description
The infrastructure should be generic enough to support limit setting on multiple POIs,
What is the status of this? I understand multiple free parameters (normfactor) are possible, but I assume they will all be profiled? It's not entirely clear from the documentation...
we don't have support for this now, but since we added capabilituy to set arbitrary parameters constant in #653 this should be simple to add, once #690 is in, we can expand this to multiple POIs. If you have a usecase, I'm happy to add this functionality
That's great! I'm interested in doing likelihood unfolding, but there are many other use cases for multi-parameter fits (for instance, fitting EFT parameters). Here CLs limits are not really useful but one would be interested in general confidence intervals, Minos intervals (while either profiling other POIs or keeping them fixed), beside simply getting the best fit and covariance matrix.
Some remarks though:
- For likelihood unfolding one should also be able to implement regularisation, i.e. an additional constraint term in the likelihood that depends on the POIs.
- Stuff like EFT parameter fits would require going beyond simple linear scaling of the signal yields with the POIs.
AFAIK neither of those is supported at the moment?
Hi,
I think we're also interested in this now, @lukasheinrich . For the ATLAS hh4b analysis we'd like to do simultaneous ggF and VBF fits, for example, and we're also interested in SMEFT interpretations where we could fit several parameters simultaneously. It'd be great to get support in PyHF for these kind of models!
Best, Max
We can look into adding this into an upcoming release once v0.4.2 and v0.5.0 are shipped — which means soon!
Hi, for basic MLE fits a "POI" is not different from any other normalization factor, so this should already work at the moment. For things like parameter rankings where the calculation is done wrt. a specific parameter (= the POI) it matters.
We will likely explore this in our run up to v0.6.0. #846 and #989 are working towards this -- since you want to fix multiple parameters as part of the fit. I will mention two things in brief:
- as part of v0.5.0 that was released recently, we provide extensible support for custom (user-provided) optimizers to plug into
pyhfthat can be used in the meantime until we have support for multiple fixed parameters. Support will come through the ability to have fits with multiple fixed parameters. We already support an unconstrained fit across all parameters, some of which might be POIs. - HistFactory only allows for the declaration of a single POI in the specification. If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.
#1051 is in, so we can move forward.
If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.
@kratsg I have one here: /afs/cern.ch/user/a/alheld/public/pyhf_multi_POI
<POI>Signal_norm Bg_norm</POI>
I am not sure whether this is using HistFactory as intended, but it gets correctly interpreted in at least two (ATLAS-internal) tools I tried it with.
Hi,
I was looking forward to this update being part of v0.6.0 but I can see it's now been pushed back to v.0.7.0. Do you have a time estimate for when this development will be complete? Thanks!
Kind regards, James
Now that we have toys in I think we can prfioritize this. We definitely want to deliver 0.7.0 more quickly than 0.6.0 and I hope within perhaps O(2 months) we can at least have some demo notebooks to to multi-POI fits (the code right now should already enable this a you can fits pdfs with arbitrary fixed parameters, but we're lacking convenience APIs.
@jagrundy have you tried doing this manually by any chance?
Hi Lukas,
Thank you for the quick reply and for the update. I haven't tried doing this manually but am aware that it's possible.
James
Yeah, the timeline seems about right. There are a few hurdles in order to get this happening:
- JSON schema (currently POI is stored as a string): can either accept string+space-separated [no change] or list of strings [change to schema]
- anything relying on
model.config.poi_index: one assumption in a large part of the bookkeeping is that there's a single POI. Luckily, not too many core functions rely on this (as fitting can support multiple POIs)
I think while both of these hurdles are rather minor, it's mainly figuring out a consistent API that doesn't throw off a lot of users.
... it's mainly figuring out a consistent API that doesn't throw off a lot of users.
@jagrundy @swertz @mswiatlo what would be helpful is to try to understand what API users are really expecting. As it seems that you're waiting for this can you help by doing the following:
- Create example workspaces with the POIs you can share with us.
- Create and share a script with us that uses the existing API and the example workspaces to perform what you hoping that a future API for multiple POI would do.
- Give examples, or explain in detail, what you're expecting the API return structure to be.
This will help us to be able to get the thing that you're actually wanting out sooner.
Some thoughts from my perspective:
- Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like
"Signal normalization"? Currently those are possible I think. pyhf.infer.hypotestwould need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info.infer.fixed_poi_fitcould probably behave exactly the same, and the test statistics probably too?model.config.poi_indexcould probably work as it currently does for single POI models if you want to stay backwards compatible. Another function should return the index for a given POI name (with the name as optional kwarg), could again default to the first in the list and possibly warn/info if multiple POIs are in the workspace.- A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
for poi in model.config.poi_list: pyhf.infer.hypotest(..., poi=poi) - In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the
inferfunctions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.
Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?
Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?
are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.
- Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like
"Signal normalization"? Currently those are possible I think.
ROOT doesn't quite allow this in the HiFa XML spec. Not sure what we do here. Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...
pyhf.infer.hypotestwould need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info.infer.fixed_poi_fitcould probably behave exactly the same, and the test statistics probably too?
Yes. Like how we handle multiple measurements.
- A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
for poi in model.config.poi_list: pyhf.infer.hypotest(..., poi=poi)
Yes.
- In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the
inferfunctions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.
This would be where we'd ask for the first/second/third/nth POI fit. mle.infer.fixed_nth_poi_fit(...) which is a wrapper.
Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?
are you asking for support in the spec or in the code?
pyhfcan handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.
Excellent, I am very glad to hear that we can do the multiple POI in the code already, I will try this in my case.
You can look here (fixed_poi_fit is only a few-lines wrapper around pyhf.infer.mle.fit which handles all situations): https://github.com/scikit-hep/pyhf/blob/1884c6c08bc5b629724dd9ef3c60ec4386353b94/src/pyhf/infer/mle.py#L196-L202
setting fixed_params[index] = True and init_pars[index] = fixed_value is all you need, for each POI index. We just haven't gotten to being able to serialize this information consistently.
Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...
A list of strings is arguably the correct solution for this, and I think it's absolutely worth breaking this one particular roundtrip use case in favor of streamlining the experience for everything else. As far as I can tell, the issue only arises in this quite specific setup:
- workspace built / edited in JSON format (otherwise parameters would not have spaces anyway),
- translated to xml+ROOT (this can emit a suitable warning, and auto-convert spaces to underlines or similar),
- translated back to JSON.
The extent to which things "break" is that parameter names change automatically (or have to be changed manually), and that is only relevant if they contain a space, which presumably is very rare to begin with. Given that this can be caught during conversion and the user can be informed, I think this is quite minor.
I tried out a roundtrip from JSON with a POI called Signal norm. Conversion to xml is fine, and xml back to JSON recovers the original spec. A problem only arises with hist2workspace, which looks for two POIs in that setup and cannot find either of them:
WARNING: Can't find parameter of interest: Signal in Workspace. Not setting in ModelConfig.
WARNING: Can't find parameter of interest: norm in Workspace. Not setting in ModelConfig.
Along similar lines, I tried out adding an emoji to a normfactor name: no obvious errors, but in the json2xml -> xml2json roundtrip there's some (presumably encoding-related) slight change: 😀 becomes \ud83d\ude00. (I'm not trying to argue that support for this kind of stuff is needed, but perhaps it may be useful to add a projection to json2xml when encountering Unicode characters, within pyhf itself I see no issues.)
When adding the same emoji to a nuisance parameter name (which has an associated histogram), json2xml works fine, but neither xml2json nor hist2workspace work. That might be a lack of support in ROOT though, since the errors seem to be related to finding the histogram in the .root file.
(edit: the last point was unrelated to pyhf, see https://github.com/scikit-hep/uproot4/issues/576)
Hi,
Can you please confirm if in the current version, multiple POI workspace can be converted to pyhf in JSON format as desired by default? Given that currently many analyses are using profile likelihood unfolding and some are published, it will be good to have this feature for likelihood publication.
Regards, Amartya
pyhf xml2json functionality works normally in the JSON format. It's just not something that's implemented in terms of the functionality. The serialization should be ok.
I see. Thanks!
The "as desired" part is arguable, I would personally find a format where the POI entry is a list of strings better in the longer term. This is a breaking change however. Right now you will get a single string with spaces between POIs like
"poi": "poi_1 poi_2 poi_3"
which pyhf (as of the current implementation) will fail to convert into a model.
You could consider only specifying a single POI in the model, as this is technically metadata without immediate impact on inference.
If there will be a move to a list format ["poi_1", "poi_2", "poi_3"] at some point in the future, that would either require a workspace update or be handled in the pyhf implementation of reading the workspace information (which is tricker since currently whitespaces are supported in POI names, causing possible ambiguities).
@amartyarej how quick of a timescale are you asking for? We have a ton of changes in the pipeline.
We are planning to finalize our studies by end of August