pyhf
pyhf copied to clipboard
[WIP] Dask Integration
Description
Adresses #259 this was really a drop in replacement (except or tensorlib.ones(), where one needed to add chunks)
simple logpdf evaluation works
import pyhf
import pyhf.tensor
import pyhf.tensor.dask_backend
pyhf.tensor.dask_backend.dask_backend()
db = pyhf.tensor.dask_backend.dask_backend()
pyhf.set_backend(db)
import pyhf.simplemodels
pdf = pyhf.simplemodels.hepdata_like(signal_data=[7.], bkg_data=[50.], bkg_uncerts=[7.])
testdata = pdf.expected_data(pdf.config.suggested_init())
v = pdf.logpdf(pdf.config.suggested_init(), testdata)
print(v.compute())
Checklist Before Requesting Approver
- [ ] Tests are passing
- [ ] "WIP" removed from the title of the pull request
just making @kratsg @matthewfeickert aware.. not for review yet.. nice thing is we get easy visualization of the computational graph
interestingly some of the basic tensorlib tests fail
> assert np.std(values) < 1e-6
E assert 0.2802208533607308 < 1e-06
E + where 0.2802208533607308 = <function std at 0x1155a41e0>([-16.948276294321396, -16.948274612426758, -16.94827651977539, -16.948274612426758, -17.648827643136507])
E + where <function std at 0x1155a41e0> = np.std
a) good thing we have tests :) b) probably good to add more tests for each of the tensorlib methods so that we now better which one is failing
edit: reason is that in the tests we compare poisson-from-normal values not real poisson, forgot to set the flag in the dask backend
Coverage decreased (-0.2%) to 97.278% when pulling d8e2a663e776d15ce5058bd23f9165258b160fca on tensor/dask into b143ffd7c7d874c5144a3ca299aad325afc52d45 on master.
@lukasheinrich Can you verify that the updated backend + optimizer table is correct?
I'm going to rebase this against origin/master to bring in the work from PR #262
we definitely need to add the new methods in #262. Apart from that we might want to understand Dask a bit better
@lukasheinrich abs, zeros, concatenate, and einsum are copied in now. As you point out, at the moment we are just using Dask as a NumPy clone for the most part. Are there explicit things that you wanted to try to explore for this PR or should we first do more research on Dask's capabilities and then come back to this?