pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

Let measurement config specify a sequence of booleans in ['parameter']['fixed'] field

Open lhenkelm opened this issue 3 years ago • 3 comments

Summary

The current schema only allows a single/scalar boolean for the 'fixed' field in a parameter spec. But since #1639 paramsets allow/expect a tuple of bools for fixed, according to the size of the parameter set, so it would be very convenient to align the schema and allow measurements in a workspace to define fixed per-scalar NP, i.e. as a sequence of bools (similar to 'inits' and the rest of the parameter fields).

This would also make it easier to implement a solution to #662, staterror pruning can then simply be function to calculate the relevant 'fixed' sequences and create a modified workspace spec to hold them.

Code of Conduct

  • [X] I agree to follow the Code of Conduct

lhenkelm avatar Oct 28 '22 16:10 lhenkelm

I thought we had an issue for this already... @alexander-held we talked about this before...

kratsg avatar Oct 28 '22 18:10 kratsg

We talked about related things e.g. in #1944 but I think we have not talked so far specifically about the spec implementation. I agree that there should be support in the spec as well. That could be done by allowing both a boolean and an array of booleans as possible values (with the single boolean broadcasting if needed?), or by only allowing arrays (which would not be backwards-compatible).

With regards to #662: while this would allow pruning gammas per bin, we also would need to keep track of the pruning threshold somewhere to allow for a translation to the xml + ROOT format. The ROOT implementation determines from the threshold what should be pruned when constructing the model (so this information is not immediately accessible from just looking at the xmls). It does sound useful to me to actually see the effect of the pruning directly written out in the workspace as well (via fixed parameters), however that would mean that editing the threshold implies changes in two places (the number for the threshold, plus the fixed parameter settings) and a workspace could end up in a state where the information in both places disagrees.

alexander-held avatar Nov 02 '22 08:11 alexander-held

Re: backwards compatibilty, I imagine the easiest (to maintain going forward) is to bump the major version of the schema. Workspace specs know which schema version they want to be validated against, so this would be transparent to users & authors of workspace specs.

Re: #662: yes when I wrote this I was more concerned with getting the pruning to work pyhf-side, and not thinking about enabling the pyhf<->HF roundtrip so much. But you are right, it is a valid concern.

To me it seems "pruning threshold" is a much more meta bit of information than what exists in the pyhf woorkspace schema so far. That does not mean it should not be included, but I would want the spec to separate facts about the reasoning for and history of the workspace I am looking at from the bare data of the workspace itself. (Maybe something like an optional & flexible metadata header might work, coupled with pyhf validation checking that the specified metadata does in fact yield the specific NP pruning?).

I don't think the JSON spec should tie itself too closely to the XML format here. IMO the ability to just read the "dumb" data in a JSON spec is a great advantage, for debugging and understanding fits. It also softly encourages users towards a robust "clean architecture"-ish style of managing fits. But making the pruning threshold an explicitly required part of the spec may be a step backwards. Ofc. as an optional, separable metadata with a consistency check as part of validation it would be fine from this point of view.

lhenkelm avatar Nov 02 '22 15:11 lhenkelm