pymc
pymc copied to clipboard
ENH: Add an explicit dummy prior for predictive modeling
Before
import pymc as pm
import numpy as np
with pm.Model() as m:
mu = pm.Normal('mu')
sigma = pm.Exponential('sigma', 1)
obs = pm.Normal('obs', mu, sigma, observed=np.random.normal(size=(100,))
idata = pm.sample()
# Do a "predictive model", changing the mean
with pm.Model() as predictive_model:
mu = pm.Normal('new_mu', mu=10, sigma=1)
sigma = pm.Flat('sigma')
idata_oos = pm.sample_posterior_predictive(idata, predictions=True)
After
import pymc as pm
import numpy as np
with pm.Model() as m:
mu = pm.Normal('mu')
sigma = pm.Exponential('sigma', 1)
obs = pm.Normal('obs', mu, sigma, observed=np.random.normal(size=(100,))
idata = pm.sample()
# Do a "predictive model", changing the mean
with pm.Model() as predictive_model:
mu = pm.Normal('new_mu', mu=10, sigma=1)
sigma = pm.FromIData('sigma')
idata_oos = pm.sample_posterior_predictive(idata, predictions=True)
Context for the issue:
It is extremely convenient to use pm.Flat
as a "dummy distribution" when doing predictive modeling as shown above. This is because it has no random method, and thus ensures that an error will be raised by pm.sample_posterior_predictive
if something goes wrong with name matching between the variables in idata
and those declared in predictive_model
. This is clearly an off-label use for pm.Flat
, though. It's also not at all obvious why someone would want to do this without being in the know; the resulting code is not readable.
I propose to add a dummy distribution specifically for this purpose, that would make it obvious to a reader which variables are being targeted for sampling from the idata, and which are being given new values. I don't have any great ideas about the name, except to avoid the name pm.FromPosterior
, which suggests to users that it could be used to do some kind of iterative sampling (like what pmx.prior_from_idata
or does).