pyhf Pythonic Specification Generation

I strongly suggest we keep the most stringent JSON spec in #105 as the only JSON-schema we have. A "lite JSON" would probably confuse newcomers and those wishing to use an API. Instead, a more user-friendly, pythonic generation can be built. Something like the following is certainly possible [inspired by how constructions does it]

NPs = [NormSys("JES1"), HistoSys("JES2")]
sample = Sample(
    NormSys("JES1") / Data(.....),
    HistoSys("JES2") / Data(....)
)
chan = "singlechannel" / Channel( "signal" / sample)

then doing something like JSON.dump(chan) would work out of the box, as you can define how to serialize such an object. chan can implement vars(chan) which returns the simple python structure that can be passed into hfpdf -- similar to how argparse::Namespace does it.

Caveat: division does not need to be done. One could just as easily do NormSys("JES1").Data("....") and so on.

Apr 10 '18 14:04 kratsg

I was thinking about this.

Pydantic has the ability to export JSON schema from its model classes, and I believe codegen from JSON schema is also possible. One possibility here would be to codegen pydantic models from the hand written JSON schema. These pydantic models could then essentially be used as a python model building API, and maybe even be used as the source-of-truth for the schema itself.

Oct 01 '20 12:10 paulgessinger

thanks @paulgessinger - certainly also something that @alexander-held has thought about

Oct 01 '20 12:10 lukasheinrich

one could imagine a similar workflow to keras

from pyhf.build import histosys, normsys, model ...
m = model()
m.add(histosys(...))
pdf = pyhf.Model(m.compile()) #from json 
pdf.fit(data)

which brings it full circle to MakeModelAndMeasurementsFast :)

Oct 01 '20 12:10 lukasheinrich

Yeah, that looks what I had in mind. 'Rendering' pydantic models to dicts which are consumed by pyhf.Model is most likely the simplest way to go.

Oct 01 '20 12:10 paulgessinger

I wonder if this is something that can be decoupled from core pyhf with the JSON schema serving as the data exchange, (either as pyhf.contrib or in cabinetry or as a separate project)

Oct 01 '20 12:10 lukasheinrich

@lukasheinrich's example looks nice and could clean up the workspace building in cabinetry a bit. Currently it is like this

if sample_affected_by_modifier(sample, NormFactor):
    modifiers.append({"data": None, "name": NormFactor["Name"], "type": "normfactor"})

and could be more like this (though maybe it's cleaner to follow the workspace structure and add modifiers to samples than to a model)

model.add(normfactor(NormFactor["Name"], get_affected_samples(NormFactor))

Factoring out the names of the keys and the exact format would also mean that things are stable should the workspace format change, since that API used to build the workspace would change at the same time.

The barrier to write a pyhf workspace feels reasonably low, and given that there really aren't a lot of different modifiers and elements to the workspace, a model building API could probably be quite lightweight. On the other hand, it's also not too hard for people to implement it themselves since they're writing something against a rather stable and well-defined workspace specification.

Oct 01 '20 14:10 alexander-held

One could consider marshmallow as well.

SR1 = pyhf.orm.Channel('signal_region')
ttbar = pyhf.orm.Sample('ttbar')
wjets = pyhf.orm.Sample('wjets')

SR1.add_sample(ttbar)
SR1.add_sample(wjets)
# SR1.add_samples([ttbar, wjets])

the tricky part of the API is that we would need to be able to define ttbar with different expected rates in different channels somehow. One could either restructure the model building to be sample-first instead of channel-first...

Nov 09 '21 16:11 kratsg

reviving this based on some discussions with @pfackeldey @kratsg @matthewfeickert

Dec 07 '23 15:12 lukasheinrich

pyhf pyhf copied to clipboard

Pythonic Specification Generation

pyhf
pyhf copied to clipboard