nutpie icon indicating copy to clipboard operation
nutpie copied to clipboard

Nutpie doesn't compute element-wise log-likelihood

Open AlexAndorra opened this issue 1 year ago • 15 comments

The elemwise log-likelihood is not stored in the InferenceData that nutpie returns, even when asking for it. The following for instance doesn't error out, but doesn't add a log_likelihood group to the trace (whereas it does when using the default PyMC sampler):

y = np.array([28, 8, -3, 7, -1, 1, 18, 12])
sigma = np.array([15, 10, 16, 11, 9, 11, 10, 18])
J = len(y)

with pm.Model() as pooled:
    mu = pm.Normal("mu", 0, sigma=10)
    obs = pm.Normal("obs", mu, sigma=sigma, observed=y)

    trace_p = pm.sample(nuts_sampler="nutpie", idata_kwargs={"log_likelihood": True}) # doesn't store
    trace_p_ = pm.sample(idata_kwargs={"log_likelihood": True}) # does store

In PyMC it's not that big of a deal (although it adds friction to the user workflow), as one can just do:

with pooled:
    pm.compute_log_likelihood(trace_p)

But that may be a small issue for Bambi users, which are usually less advanced (cc @tomicapretto). They'd have to do pooled.compute_log_likelihood(trace_p), which takes much more time to compute

AlexAndorra avatar Oct 08 '24 10:10 AlexAndorra

I was searching for this exact issue when using Bambi alongside nutpie. If I use the following code, the log-likelihood is not calcualted:

model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True})

I also tried the following:

model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True, "extend_inference_data":True)

and this did not work either.

Thanks for opening up this issue.

djlacombe avatar Jan 16 '25 17:01 djlacombe

Glad this was useful @djlacombe . I'm actually wondering it this was solved by a recent PR on Bambi (still shouldn't work on when using PyMC directly) 🤔

Could you update to this latest version of Bambi, try it out and report back please?

AlexAndorra avatar Jan 22 '25 17:01 AlexAndorra

Thanks for reminding me. Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.

aseyboldt avatar Jan 22 '25 18:01 aseyboldt

#74 is related.

aseyboldt avatar Jan 22 '25 18:01 aseyboldt

@AlexAndorra

I updated Bambi to the 0.15 version and ran the two lines of code in my original post separately, i.e.:

model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True})

and

model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True, "extend_inference_data":True)

which sampled just fine but did not produce the log-posterior in the results structure. I think it's actually being calculated because the sampling was finished pretty quickly, it's just that it's not being saved.

djlacombe avatar Jan 22 '25 19:01 djlacombe

Ok so that means it's on nutpie's side, as @aseyboldt was saying

AlexAndorra avatar Jan 22 '25 20:01 AlexAndorra

Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.

That doesn't sound to hard. @djlacombe , do you feel like trying a PR out?

AlexAndorra avatar Jan 22 '25 20:01 AlexAndorra

@AlexAndorra I appreciate the confidence, but I'm not sure if I have the skills to do this.

djlacombe avatar Jan 23 '25 16:01 djlacombe

That's why we're here for @djlacombe -- answer your questions and guide you along the way. Unless @aseyboldt thinks it's too complex for a beginner-level issue

AlexAndorra avatar Jan 23 '25 19:01 AlexAndorra

I am also struggling with this issue. I am using the nutpie sampler and I would like to perform some model comparison. Is this issue mentioned

Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.

I am more of a user and feeling like I'm in the same boat as @djlacombe in terms of having the right skills. Is the above issue regarding this function:

https://github.com/pymc-devs/nutpie/blob/3328f64c9c4fe415688e53867227bb1024409fd9/python/nutpie/compile_pymc.py#L447

I am not sure what it means to add as deterministic outside of the model declaration.

TBH I am not really sure where to begin with this issue but I would love to be able to perform model comparison sometime soon.

jabrantley avatar Mar 10 '25 15:03 jabrantley

@jabrantley You should just be able to compute the elementwise density with this right now:

with model:
    pm.compute_log_likelihood(trace)

Or is this not working for you somehow? The issue here is more that it would be convenient if nutpie had an option to compute this while sampling, but nothing should be stopping you from running model comparison as it is.

aseyboldt avatar Mar 10 '25 15:03 aseyboldt

@aseyboldt

I think the issue might be related to this working in Bambi, i.e., using nutpie within Bambi and it returning the logp values.

I'm willing to work on this if I can get a bit of direction.

djlacombe avatar Mar 10 '25 15:03 djlacombe

I think with bambi this should do the trick: https://bambinos.github.io/bambi/api/Model.html#bambi.Model.compute_log_likelihood

As to an implementation in nutpie itself: We could add an argument compute_log_likelihood to nutpie.compile_pymc_model. That can pass that info on to _make_functions. (This might be one of the worst parts of the nutpie codebase, sorry about that, and if you want to clean that up a bit while adding this feature I definitely wouldn't mind). This function compiles to functions: the logp function, which we don't care about here, and the expand function, which take a position on the unconstrained space and returns all the individual variables on the constrained space and all the deterministics. We want to add the log likelihoods there as additional variables. The sampler will call that function once for each draw it produces, and write the results into the trace object.

We then need to add some logic here to sort the log_likelihood into the correct group in arviz.

aseyboldt avatar Mar 10 '25 16:03 aseyboldt

Ah yes, @aseyboldt that does work. Sorry, I did not see that in @AlexAndorra's original post. I have always passed it through idata_kwargs so I didn't even think to compute it directly . Thanks!

jabrantley avatar Mar 10 '25 18:03 jabrantley

No worries :-)

aseyboldt avatar Mar 10 '25 18:03 aseyboldt