pyhf
pyhf copied to clipboard
Pythonic Specification Generation
I strongly suggest we keep the most stringent JSON spec in #105 as the only JSON-schema we have. A "lite JSON" would probably confuse newcomers and those wishing to use an API. Instead, a more user-friendly, pythonic generation can be built. Something like the following is certainly possible [inspired by how constructions does it]
NPs = [NormSys("JES1"), HistoSys("JES2")]
sample = Sample(
NormSys("JES1") / Data(.....),
HistoSys("JES2") / Data(....)
)
chan = "singlechannel" / Channel( "signal" / sample)
then doing something like JSON.dump(chan)
would work out of the box, as you can define how to serialize such an object. chan
can implement vars(chan)
which returns the simple python structure that can be passed into hfpdf -- similar to how argparse::Namespace
does it.
Caveat: division does not need to be done. One could just as easily do NormSys("JES1").Data("....")
and so on.
I was thinking about this.
Pydantic has the ability to export JSON schema from its model classes, and I believe codegen from JSON schema is also possible. One possibility here would be to codegen pydantic models from the hand written JSON schema. These pydantic models could then essentially be used as a python model building API, and maybe even be used as the source-of-truth for the schema itself.
thanks @paulgessinger - certainly also something that @alexander-held has thought about
one could imagine a similar workflow to keras
from pyhf.build import histosys, normsys, model ...
m = model()
m.add(histosys(...))
pdf = pyhf.Model(m.compile()) #from json
pdf.fit(data)
which brings it full circle to MakeModelAndMeasurementsFast :)
Yeah, that looks what I had in mind. 'Rendering' pydantic models to dicts which are consumed by pyhf.Model
is most likely the simplest way to go.
I wonder if this is something that can be decoupled from core pyhf with the JSON schema serving as the data exchange, (either as pyhf.contrib
or in cabinetry
or as a separate project)
@lukasheinrich's example looks nice and could clean up the workspace building in cabinetry
a bit. Currently it is like this
if sample_affected_by_modifier(sample, NormFactor):
modifiers.append({"data": None, "name": NormFactor["Name"], "type": "normfactor"})
and could be more like this (though maybe it's cleaner to follow the workspace structure and add modifiers to samples than to a model)
model.add(normfactor(NormFactor["Name"], get_affected_samples(NormFactor))
Factoring out the names of the keys and the exact format would also mean that things are stable should the workspace format change, since that API used to build the workspace would change at the same time.
The barrier to write a pyhf workspace feels reasonably low, and given that there really aren't a lot of different modifiers and elements to the workspace, a model building API could probably be quite lightweight. On the other hand, it's also not too hard for people to implement it themselves since they're writing something against a rather stable and well-defined workspace specification.
One could consider marshmallow
as well.
SR1 = pyhf.orm.Channel('signal_region')
ttbar = pyhf.orm.Sample('ttbar')
wjets = pyhf.orm.Sample('wjets')
SR1.add_sample(ttbar)
SR1.add_sample(wjets)
# SR1.add_samples([ttbar, wjets])
the tricky part of the API is that we would need to be able to define ttbar with different expected rates in different channels somehow. One could either restructure the model building to be sample-first instead of channel-first...
reviving this based on some discussions with @pfackeldey @kratsg @matthewfeickert