pisa
pisa copied to clipboard
PID stage should output maps that have PID categories as binning dims
Currently, PID stage outputs multiple maps, one for each PID category. This is probably a bug since DistributionMaker.get_outputs(return_sum=True)
I expect will simply collapse all the maps into one, losing any PID information. (need to test this, though...)
I'd much prefer that the behavior of PID being a binning dimension is consistent across the codebase over modifying DistributionMaker.get_outputs(return_sum=True)
(and any other current and future code that has to handle outputs from either a PID stage or a reco w/ binned-PID stage).
See e.g. how make_asymmetry_plots.py
currently implements handling the case where there's a PID stage vs. reco stage that handles PID. (Note that this issue is therefore pertinent to PR #247 & Issue #158 but has implications well beyond that, too.)
Implementing this idea is tricky since translating from what can be very arbitrary PID criteria to a numerical value for binning dimension is fraught. Options for handling this are:
- Actually try to translate from the PID criteria to numerical values. If this fails, either just return the maps separately (which doesn't help with compatibility) OR go to idea (2) below
- Use arbitrary (but not nonsensical) numerical values to indicate bin #, which have no (necessary) correlation with the actual PID values. E.g.: [0,1) -> PID bin 0 (whatever that means); give this bin a
bin_name
, though, according to the first PID criteria label. [1,2) -> PID bin 1, ... , etc.
Idea (1) above will be difficult to implement since PID criteria for the PID stage can be pretty much completely arbitrary.
Idea (2) above will result in numerical binning that is incompatible with pipelines run with PID performed as a binning dimension in the reco stage. However, this forces compatibility with the assumptions made elsewhere in PISA and one pipeline's output using this type of PID stage will be compatible with another pipeline's logic using the same type of PID stage.
If there is further need to make an output of such a PID stage compatible with the output of a reco stage that also handles PID within it, that mapping can be done with more intelligent functions. Also, since both should have the same bin_name
s, this logic shouldn't be terribly difficult to implement.
After discussing this with @philippeller, he and I think idea (2) above is the best way to go. Please provide feedback, though, if idea (1) seems preferable or if there are modifications to the above or further ideas for handling this.
@philippeller committed a fix directly to the cake
branch, he tells me. So closing this.
I could be wrong, but I think this was only done in the param
service. I believe this to be the case since I only had to do a bugfix in the PISA 2 consistency script when testing the param
service. Could you confirm @philippeller? If so, the same modification should probably be applied to the hist
service for consistency.