pyhf icon indicating copy to clipboard operation
pyhf copied to clipboard

Pythonic Specification Generation

Open kratsg opened this issue 6 years ago • 8 comments

I strongly suggest we keep the most stringent JSON spec in #105 as the only JSON-schema we have. A "lite JSON" would probably confuse newcomers and those wishing to use an API. Instead, a more user-friendly, pythonic generation can be built. Something like the following is certainly possible [inspired by how constructions does it]

NPs = [NormSys("JES1"), HistoSys("JES2")]
sample = Sample(
    NormSys("JES1") / Data(.....),
    HistoSys("JES2") / Data(....)
)
chan = "singlechannel" / Channel( "signal" / sample)

then doing something like JSON.dump(chan) would work out of the box, as you can define how to serialize such an object. chan can implement vars(chan) which returns the simple python structure that can be passed into hfpdf -- similar to how argparse::Namespace does it.

Caveat: division does not need to be done. One could just as easily do NormSys("JES1").Data("....") and so on.

kratsg avatar Apr 10 '18 14:04 kratsg

I was thinking about this.

Pydantic has the ability to export JSON schema from its model classes, and I believe codegen from JSON schema is also possible. One possibility here would be to codegen pydantic models from the hand written JSON schema. These pydantic models could then essentially be used as a python model building API, and maybe even be used as the source-of-truth for the schema itself.

paulgessinger avatar Oct 01 '20 12:10 paulgessinger

thanks @paulgessinger - certainly also something that @alexander-held has thought about

lukasheinrich avatar Oct 01 '20 12:10 lukasheinrich

one could imagine a similar workflow to keras

from pyhf.build import histosys, normsys, model ...
m = model()
m.add(histosys(...))
pdf = pyhf.Model(m.compile()) #from json 
pdf.fit(data)

which brings it full circle to MakeModelAndMeasurementsFast :)

lukasheinrich avatar Oct 01 '20 12:10 lukasheinrich

Yeah, that looks what I had in mind. 'Rendering' pydantic models to dicts which are consumed by pyhf.Model is most likely the simplest way to go.

paulgessinger avatar Oct 01 '20 12:10 paulgessinger

I wonder if this is something that can be decoupled from core pyhf with the JSON schema serving as the data exchange, (either as pyhf.contrib or in cabinetry or as a separate project)

lukasheinrich avatar Oct 01 '20 12:10 lukasheinrich

@lukasheinrich's example looks nice and could clean up the workspace building in cabinetry a bit. Currently it is like this

if sample_affected_by_modifier(sample, NormFactor):
    modifiers.append({"data": None, "name": NormFactor["Name"], "type": "normfactor"})

and could be more like this (though maybe it's cleaner to follow the workspace structure and add modifiers to samples than to a model)

model.add(normfactor(NormFactor["Name"], get_affected_samples(NormFactor))

Factoring out the names of the keys and the exact format would also mean that things are stable should the workspace format change, since that API used to build the workspace would change at the same time.

The barrier to write a pyhf workspace feels reasonably low, and given that there really aren't a lot of different modifiers and elements to the workspace, a model building API could probably be quite lightweight. On the other hand, it's also not too hard for people to implement it themselves since they're writing something against a rather stable and well-defined workspace specification.

alexander-held avatar Oct 01 '20 14:10 alexander-held

One could consider marshmallow as well.

SR1 = pyhf.orm.Channel('signal_region')
ttbar = pyhf.orm.Sample('ttbar')
wjets = pyhf.orm.Sample('wjets')

SR1.add_sample(ttbar)
SR1.add_sample(wjets)
# SR1.add_samples([ttbar, wjets])

the tricky part of the API is that we would need to be able to define ttbar with different expected rates in different channels somehow. One could either restructure the model building to be sample-first instead of channel-first...

kratsg avatar Nov 09 '21 16:11 kratsg

reviving this based on some discussions with @pfackeldey @kratsg @matthewfeickert

lukasheinrich avatar Dec 07 '23 15:12 lukasheinrich