pymc-examples
pymc-examples copied to clipboard
UpdatingPriors
updating_priors:
[Notebook url] (https://docs.pymc.io/en/v3/pymc-examples/examples/pymc3_howto/updating_priors.html):
For the following code section, to get to the posterior trace, I had to change to trace.posterior.alpha.mean(axis=0)
, etc.
I am not sure if this is correct. Check the code:
for _ in range(10):
# generate more data
X1 = np.random.randn(size)
X2 = np.random.randn(size) * 0.2
Y = alpha_true + beta0_true * X1 + beta1_true * X2 + np.random.randn(size)
model = Model()
with model:
# Priors are posteriors from previous iteration
alpha = from_posterior("alpha", trace.posterior.alpha.mean(axis=0)) #CA
beta0 = from_posterior("beta0", trace.posterior.beta0.mean(axis=0)) #CA
beta1 = from_posterior("beta1", trace.posterior.beta1.mean(axis=0)) #CA
# Expected value of outcome
mu = alpha + beta0 * X1 + beta1 * X2
# Likelihood (sampling distribution) of observations
Y_obs = Normal("Y_obs", mu=mu, sigma=1, observed=Y)
# draw 10000 posterior samples
trace = sample(1000)
traces.append(trace)
But, I got the following error on trace = sample(1000)
:
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'alpha_interval__': array(nan), 'beta0_interval__': array(nan), 'beta1_interval__': array(nan)}
Initial evaluation results:
{'alpha': nan, 'beta0': nan, 'beta1': nan, 'Y_obs': nan}
I would do:
...
post_means = trace.posterior.mean(("chain", "draw"))
...
alpha = from_posterior("alpha", post_means.alpha)
# or
alpha = from_posterior("alpha", post_means.alpha.values.flatten())
Potentially useful references:
- https://oriolabrilpla.cat/python/arviz/pymc/xarray/2022/06/07/pymc-arviz.html
- https://cluhmann.github.io/inferencedata/
- https://python.arviz.org/en/latest/getting_started/WorkingWithInferenceData.html
@OriolAbril , I got the following error when testing your suggestion.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_10784/1212275851.py in <module>
10 # Priors are posteriors from previous iteration
11 post_means = trace.posterior.mean(("chain", "draw"))
---> 12 alpha = from_posterior("alpha", post_means.alpha)
13 beta0 = from_posterior("beta0", post_means.beta0)
14 beta1 = from_posterior("beta1", post_means.beta1)
~\AppData\Local\Temp/ipykernel_10784/45955940.py in from_posterior(param, samples)
3 width = smax - smin
4 x = np.linspace(smin, smax, 100)
----> 5 y = stats.gaussian_kde(samples)(x)
6
7 # what was never sampled should have a small probability but not 0,
~\anaconda3\lib\site-packages\scipy\stats\kde.py in __init__(self, dataset, bw_method, weights)
191 self.dataset = atleast_2d(asarray(dataset))
192 if not self.dataset.size > 1:
--> 193 raise ValueError("`dataset` input should have multiple elements.")
194
195 self.d, self.n = self.dataset.shape
ValueError: `dataset` input should have multiple elements.
Oh, I see. I was only trying to provide a working alternative to the indexing and averaging code, without looking and what it was doing.
You are taking the mean, therefore reducing your posterior samples to a single summary value, then trying to get a KDE from there which is impossible. The original code doesn't take the mean. It looks like you want a 1d array with both chain and draw dimensions stacked together. For that you can use
post = idata.posterior.stack(sample=("chain", "draw"))
...
alpha = from_posterior("alpha", post_means.alpha)