bambi
bambi copied to clipboard
Computing likelihoods
For some application of BAMBI on which I am working, I would need to do the following.
I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points.
As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?
Thanks all in advance!
I believe you can use the predict function. Just construct a dataframe with all ids and covariate values and feed it through the predict function. The predict function is very flexible and you can even predict on unseen groups.
On Tue, Dec 12, 2023 at 07:59 DanielRobertNicoud @.***> wrote:
For some application of BAMBI on which I am working, I would need to do the following.
I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points.
As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?
Thanks all in advance!
— Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV6QN7TZSBBSBAY62HLYJBIKDAVCNFSM6AAAAABARNZI42VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTONZQGM2TAOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@zwelitunyiswa Unfortunately, as far as I know this doesn't work. I am not looking for sampling the posterior predictive at an unseen point $x^\star$. What I want is to compute the likelihood of a given (new) data point relative to the samples for the parameters coming from fitting on some training set.
I think this could be easily (?) implemented using the pymc.compute_log_likelihood
function.
@DanielRobertNicoud I want to check if I understand your problem. Using a training dataset you get draws from a posterior distribution. Then you have, let's say, a test dataset where you know both the value of the predictors and the response. Using the draws of the posterior you got with the training data, you want to compute the likelihood for the values of the predictors and responses in the test set. Is my understanding correct?
If that's correct, I think it's not extremely simple but it should be possible. I'll wait for your input before trying something.
@tomicapretto Yes, that seems correct. As a simplified example, suppose I am doing a Bayesian linear regression $y = x^T\beta + \epsilon$ with $\epsilon\sim N(0,\sigma^2)$ with $\sigma$ known. Fitting the model will give me samples $\beta_1,\ldots,\beta_T$. Then given a new point $(x^\star,y^\star)$ what I want to compute is $\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{y^\star - (x^\star)^T\beta_t}{\sigma}\right)^2\right)$ for all $t$. Of course, I'd like to do this in such a way that any model and likelihood function is supported.
I am trying to implement this with raw pymc
, if I manage I can share a minimal working example if that helps!
@tomicapretto The following works in pure pymc
(as you can see, I test against manually computed values).
import pymc
import pandas as pd
import numpy as np
import arviz as az
import scipy.stats as sps
# generate some gaussian data for training
x_train = pd.DataFrame(np.random.normal(size=(200, 1)), columns=["x1"])
x_train["intercept"] = 1.
y_train = pd.Series(1 + .5 * x_train.x1 + np.random.normal(scale=0.1, size=200), name="y")
# single testing point
x0 = .5
x_test = pd.DataFrame({"x1": [x0], "intercept": [1.]})
y_test = pd.Series(np.zeros(x_test.shape[0]), name="y")
coords = {"coeffs": x_train.columns}
with pymc.Model(coords=coords) as model:
# data containers
X = pymc.MutableData("X", x_train)
y = pymc.MutableData("y", y_train)
# priors
b = pymc.Normal("b", mu=0, sigma=1, dims="coeffs")
sigma = pymc.HalfCauchy("sigma", beta=10)
# linear model
mu = pymc.math.dot(X, b)
# likelihood
likelihood = pymc.Normal("obs", mu=mu, sigma=sigma, observed=y)
# fit
idata = pymc.sample(return_inferencedata=True, tune=200, draws=20, chains=1)
# testing
pymc.set_data({"X": x_test.to_numpy(), "y": y_test.to_numpy()})
out = pymc.compute_log_likelihood(
az.InferenceData(posterior=idata.posterior), # keep min amount possible of info
var_names=["obs"]
)
display([
np.log(sps.norm(loc=0., scale=sigma).pdf(intercept + x0 * beta))
for (beta, intercept), sigma in zip(
idata.posterior.b.to_numpy().reshape(-1, 2),
idata.posterior.sigma.to_numpy().flatten()
)
])
display(out.log_likelihood.obs)
Thanks for the detailed example, I can't check it right now but I'll check it soon-ish.
@tomicapretto Hi, did you have time to take a look at this?
Hi @DanielRobertNicoud sorry for the delay! With holidays and work, I couldn't find time to work on Bambi. Now I could make a gap and decided to review this issue. Turns out it was not hard to add a compute_log_likelihood
method to the Model
class. Have a look at #769 and let me know if it works for your use case. You'll need to install Bambi from the branch in that PR.
Hi @DanielRobertNicoud did you have time to check it?