posteriordb
posteriordb copied to clipboard
Add Tensorflow probability and Pyro example for 8-schools
Test to give an additional PPF framework in the database.
If time, also try add Pyro, JAGS and PyMC for 8-schools (but is low prio)
Here's a pymc3 model
import pymc3 as pm3
def model(data):
J = data["J"] # number of schools
y_obs = data["y"] # estimated treatment
sigma = data["sigma"] # std of estimated effect
with pm3.Model() as pymc_model:
mu = pm3.Normal('mu', mu=0, sd=5) # hyper-parameter of mean, non-informative prior
tau = pm3.Cauchy('tau', alpha=0, beta=5) # hyper-parameter of sd
theta_trans = pm3.Normal('theta_trans', mu=0, sd=1, shape=J)
theta = mu + tau * theta_trans
y = pm3.Normal('y', mu=theta, sd=sigma, observed=y_obs)
return pymc_model
To run it
import numpy as np
J = 8
y = np.array([28, 8, -3, 7, -1, 1, 18, 12])
sigma = np.array([15, 10, 16, 11, 9, 11, 10, 18])
data = {"J": J, "y": y, "sigma": sigma}
m = model(data)
trace = pm3.sample(1000, model=m)
I tested that it works.
This is adapted from https://sharanry.github.io/post/eight-schools-model/ The model there had log_tau (maybe it gives better performance with pymc or perhaps it was just preference), I changed that to just tau.
I could make a PR with this.
Great! Two thing.
- We want this to be a stan alone py file for the model I guess?
- We want to be able to run it by pointing to the posteriordb json. Ideally it should be possible to run from R as well. Im not sure how to include the model/pymc in the best way?
My PR #118 (sorry forgot to reference this issue in it, did that now) is probably the best way to include the pymc model with the current design.
I'm not fully sure what you are asking in 2. Do you mean that will the users be able to call code <- model_code(po, framework = "pymc3") and then use the result to run the model? Or do you mean that posteriordb would have something like run_posterior(po, framework = "pymc3")?
Oh I was not super clear. I meant to point to the json data. Ie how would an call using the posteriordb look like?
Oh you mean the actual data? I'll first show how it would look like using python as R will have to go through python anyway.
import importlib
import pymc3
from posteriordb import PosteriorDatabase
my_pdb = PosteriorDatabase()
posterior = my_pdb.posterior("eight_schools")
python_code_path = posterior.model.code_file_path("pymc3")
python_lib = importlib.import_module(python_code_path)
data = posterior.data.values()
model = python_lib.model(data)
draws = pymc3.sample(model=model)
A potential way to use it from R could be
po <- posterior("eight_schools", pdb = pd)
dat <- get_data(po)
pymc_code_path <- model_code(po, "pymc3")
draws <- run_pymc(pymc_code_path, dat)
where run_pymc would essentially call a python command line script that takes model code path and data json, runs the inference and output the result as json (or something else).
Of course run_pymc(po) would be doable as well as we can get code path and data from the posterior.
I didn't comment any of the code, if you want clarifications feel free to ask!
Yes. This looks exactly how I would like to have it!
Although, I think run_pymc3 should probably not be part of posteriordb.
I agree that run_pymc3 doesn't have to be in posteriordb.
I will start out by including 8-schools for all PPFs:
Pyro: https://forum.pyro.ai/t/hierarchical-models-and-eight-schools-example/362
JAGS: http://rstudio-pubs-static.s3.amazonaws.com/15236_9bc0cd0966924b139c5162d7d61a2436.html
Tensorflow Probability: https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Eight_Schools.ipynb
PyMC (see Eeros stuff above).
Alright. When implementing JAGS Iusing run_jags() I realized that we need to automatically check that all models are equal with respect to the log density for a set of say 100 parameter values. This can be a little tricky between models implemented in R respectively Python. Especially since we want the tests to be run on travis every time.
Ideally we would like to have logp(po, params, "stan"), logp(po, params, "tfp") etc.
I promised to clarify the issue for @mtwest2718
So we want to include two things with this issue:
- Include Tensorflow code and Pyro code for the 8-schools model
- Check that the log density for the models is identical (or proportional) between TFP, Pyro, PyMC3 and Stan for say 100 parameter draws from the reference posterior.
Maybe also include the test in 2. as a continuous integration test. I'm not sure exactly how to do that so that is probably a separate issue since it involves including PyMC3, Pyro, TFP etc in Travis, something that might become a maintenance hell.