educational-resources
educational-resources copied to clipboard
TODO: figure out how to run posterior predictive simulations in Bambi
I don't know if this is helpful but I noticed in a few of the notebooks you have notes to yourself about using Bambi to get a posterior predictive check.
TODO: Verify this is true
Bambi does not have a way to make posterior predictive given a data frame of independent variables
My attempt
I am no PyMC guru but I started playing around with the possibilities.
On the forums I found this answer, How to predict new values on hold out data:
I extracted information about the bambi object manually. This could either be automated, or else there may be a function to more readily gain access to the Stan object itself, much as you can use get_stancode
with brms
.
pm_model = model.backend.model
def get_priors(model: Model):
return {x.name: x.prior.args for x in model.terms.values()}
print('vote ~ past_vote + inc')
print(get_priors(model))
pm.model_to_graphviz(pm_model)
Here I re-write the model manually (which I was able to extract from the bambi model using model.backend.model
).
# hold out test data
data90 = pd.DataFrame(dict(vote=congress["v90_adj"], past_vote=congress["v88_adj"], inc=congress["inc90"]))
# model function that we fit and sample from
def model_factory(data):
with pm.Model() as model:
a = pm.Normal("Intercept", mu=0.543, sigma=0.367)
b_pv = pm.Normal("past_vote", mu=0, sigma=0.558)
b_inc = pm.Normal("inc", mu=0, sigma=0.118)
vote_sd = pm.Uniform("vote_sd", lower=0, upper=data["vote"].std())
mu = a + (b_pv * data["past_vote"]) + (b_inc * data["inc"])
vote = pm.Normal("vote", mu=mu, sigma=vote_sd, observed=data["vote"])
return model
with model_factory(data88):
trace = pm.sample(draws=4000,
tune=700,
init=None,
start=None,
cores=2,
chains=2,
random_seed=1,
discard_tuned_samples=True,
compute_convergence_checks=True)
with model_factory(data90):
ppc = pm.sample_posterior_predictive(trace) #or whatever
ppc = az.from_pymc3(posterior_predictive=ppc)
The ppc
object is now an arviz.InferenceData
object and we can use the built in az.plot_ppc
method to visualise the posterior predictive distribution.
f, ax = plt.subplots()
az.plot_ppc(ppc, num_pp_samples=500, ax=ax);
We can also extract the the table in Figure 10.7 (P.143):
df = ppc.posterior_predictive.isel(chain=0).drop("chain").to_dataframe().reset_index().rename({"vote_dim_0": "district"}, axis=1)
pred90 = df.pivot(index="draw", columns="district", values='vote')
Outstanding question
One thing I haven't worked out / understood, is that in the book (and with the posterior_predict
function from rstanarm
), the returned table (P.143, Figure 10.7) there are columns of the estimates for $\sigma$ and the $\Beta$ coefficients. The arviz.sample_posterior_predictive
does not return estimates of the model parameters, only the posterior predictions for the test-datapoints.
I am sure you have already come up with a more elegant solution! Just posting in case this is helpful information.
As a python user I am very grateful for this workthrough of the Regression and Other Stories! So thank you
Tommy
If it helps with this. Bambi is now able to execute prior/posterior predictive sampling