bambi icon indicating copy to clipboard operation
bambi copied to clipboard

Computing likelihoods

Open DanielRobertNicoud opened this issue 1 year ago • 10 comments

For some application of BAMBI on which I am working, I would need to do the following.

I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points.

As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?

Thanks all in advance!

DanielRobertNicoud avatar Dec 12 '23 12:12 DanielRobertNicoud

I believe you can use the predict function. Just construct a dataframe with all ids and covariate values and feed it through the predict function. The predict function is very flexible and you can even predict on unseen groups.

On Tue, Dec 12, 2023 at 07:59 DanielRobertNicoud @.***> wrote:

For some application of BAMBI on which I am working, I would need to do the following.

I have a BAMBI model with some parameter $\theta$, a prior $p(\theta)$, and a likelihood $p(y\mid x,\theta)$. After getting some training data, fitting gives me parameters samples $\theta_{1:T}$. Given a new observation $(x^\star,y^\star)$, I need to compute $p(y^\star\mid x^\star, \theta_t)$ for all $t$ (and actually at multiple possible values of $y^\star$). This should be as fast as possible, as I need this computation for a rather large number of points.

As far as I can tell, there is no easy way of doing this. What I have resorted to until now is to implement this by hand depending on the model specification, but it lacks flexibility. Is anyone aware of a proper way to accomplish what I described above exploiting BAMBI directly? If not, would it be possible to implement it in the package?

Thanks all in advance!

— Reply to this email directly, view it on GitHub https://github.com/bambinos/bambi/issues/766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3QQV6QN7TZSBBSBAY62HLYJBIKDAVCNFSM6AAAAABARNZI42VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTONZQGM2TAOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zwelitunyiswa avatar Dec 12 '23 13:12 zwelitunyiswa

@zwelitunyiswa Unfortunately, as far as I know this doesn't work. I am not looking for sampling the posterior predictive at an unseen point $x^\star$. What I want is to compute the likelihood of a given (new) data point relative to the samples for the parameters coming from fitting on some training set.

DanielRobertNicoud avatar Dec 12 '23 13:12 DanielRobertNicoud

I think this could be easily (?) implemented using the pymc.compute_log_likelihood function.

DanielRobertNicoud avatar Dec 12 '23 13:12 DanielRobertNicoud

@DanielRobertNicoud I want to check if I understand your problem. Using a training dataset you get draws from a posterior distribution. Then you have, let's say, a test dataset where you know both the value of the predictors and the response. Using the draws of the posterior you got with the training data, you want to compute the likelihood for the values of the predictors and responses in the test set. Is my understanding correct?

If that's correct, I think it's not extremely simple but it should be possible. I'll wait for your input before trying something.

tomicapretto avatar Dec 13 '23 02:12 tomicapretto

@tomicapretto Yes, that seems correct. As a simplified example, suppose I am doing a Bayesian linear regression $y = x^T\beta + \epsilon$ with $\epsilon\sim N(0,\sigma^2)$ with $\sigma$ known. Fitting the model will give me samples $\beta_1,\ldots,\beta_T$. Then given a new point $(x^\star,y^\star)$ what I want to compute is $\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{y^\star - (x^\star)^T\beta_t}{\sigma}\right)^2\right)$ for all $t$. Of course, I'd like to do this in such a way that any model and likelihood function is supported.

I am trying to implement this with raw pymc, if I manage I can share a minimal working example if that helps!

DanielRobertNicoud avatar Dec 13 '23 10:12 DanielRobertNicoud

@tomicapretto The following works in pure pymc (as you can see, I test against manually computed values).

import pymc
import pandas as pd
import numpy as np
import arviz as az
import scipy.stats as sps

# generate some gaussian data for training
x_train = pd.DataFrame(np.random.normal(size=(200, 1)), columns=["x1"])
x_train["intercept"] = 1.
y_train = pd.Series(1 + .5 * x_train.x1 + np.random.normal(scale=0.1, size=200), name="y")

# single testing point
x0 = .5
x_test = pd.DataFrame({"x1": [x0], "intercept": [1.]})
y_test = pd.Series(np.zeros(x_test.shape[0]), name="y")

coords = {"coeffs": x_train.columns}

with pymc.Model(coords=coords) as model:
    # data containers
    X = pymc.MutableData("X", x_train)
    y = pymc.MutableData("y", y_train)
    # priors
    b = pymc.Normal("b", mu=0, sigma=1, dims="coeffs")
    sigma = pymc.HalfCauchy("sigma", beta=10)
    # linear model
    mu = pymc.math.dot(X, b)
    # likelihood
    likelihood = pymc.Normal("obs", mu=mu, sigma=sigma, observed=y)
    
    # fit
    idata = pymc.sample(return_inferencedata=True, tune=200, draws=20, chains=1)
    
    # testing
    pymc.set_data({"X": x_test.to_numpy(), "y": y_test.to_numpy()})
    out = pymc.compute_log_likelihood(
        az.InferenceData(posterior=idata.posterior),   # keep min amount possible of info
        var_names=["obs"]
    )

display([
    np.log(sps.norm(loc=0., scale=sigma).pdf(intercept + x0 * beta))
    for (beta, intercept), sigma in zip(
        idata.posterior.b.to_numpy().reshape(-1, 2),
        idata.posterior.sigma.to_numpy().flatten()
    )
])
display(out.log_likelihood.obs)

DanielRobertNicoud avatar Dec 13 '23 13:12 DanielRobertNicoud

Thanks for the detailed example, I can't check it right now but I'll check it soon-ish.

tomicapretto avatar Dec 13 '23 17:12 tomicapretto

@tomicapretto Hi, did you have time to take a look at this?

DanielRobertNicoud avatar Dec 22 '23 16:12 DanielRobertNicoud

Hi @DanielRobertNicoud sorry for the delay! With holidays and work, I couldn't find time to work on Bambi. Now I could make a gap and decided to review this issue. Turns out it was not hard to add a compute_log_likelihood method to the Model class. Have a look at #769 and let me know if it works for your use case. You'll need to install Bambi from the branch in that PR.

tomicapretto avatar Jan 05 '24 13:01 tomicapretto

Hi @DanielRobertNicoud did you have time to check it?

tomicapretto avatar Feb 18 '24 19:02 tomicapretto