nutpie
nutpie copied to clipboard
Nutpie doesn't compute element-wise log-likelihood
The elemwise log-likelihood is not stored in the InferenceData that nutpie returns, even when asking for it. The following for instance doesn't error out, but doesn't add a log_likelihood group to the trace (whereas it does when using the default PyMC sampler):
y = np.array([28, 8, -3, 7, -1, 1, 18, 12])
sigma = np.array([15, 10, 16, 11, 9, 11, 10, 18])
J = len(y)
with pm.Model() as pooled:
mu = pm.Normal("mu", 0, sigma=10)
obs = pm.Normal("obs", mu, sigma=sigma, observed=y)
trace_p = pm.sample(nuts_sampler="nutpie", idata_kwargs={"log_likelihood": True}) # doesn't store
trace_p_ = pm.sample(idata_kwargs={"log_likelihood": True}) # does store
In PyMC it's not that big of a deal (although it adds friction to the user workflow), as one can just do:
with pooled:
pm.compute_log_likelihood(trace_p)
But that may be a small issue for Bambi users, which are usually less advanced (cc @tomicapretto). They'd have to do pooled.compute_log_likelihood(trace_p), which takes much more time to compute
I was searching for this exact issue when using Bambi alongside nutpie. If I use the following code, the log-likelihood is not calcualted:
model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True})
I also tried the following:
model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True, "extend_inference_data":True)
and this did not work either.
Thanks for opening up this issue.
Glad this was useful @djlacombe . I'm actually wondering it this was solved by a recent PR on Bambi (still shouldn't work on when using PyMC directly) 🤔
Could you update to this latest version of Bambi, try it out and report back please?
Thanks for reminding me. Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.
#74 is related.
@AlexAndorra
I updated Bambi to the 0.15 version and ran the two lines of code in my original post separately, i.e.:
model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True})
and
model.fit(nuts_sampler='nutpie',draws=20000,cores=10,idata_kwargs={"log_likelihood": True, "extend_inference_data":True)
which sampled just fine but did not produce the log-posterior in the results structure. I think it's actually being calculated because the sampling was finished pretty quickly, it's just that it's not being saved.
Ok so that means it's on nutpie's side, as @aseyboldt was saying
Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.
That doesn't sound to hard. @djlacombe , do you feel like trying a PR out?
@AlexAndorra I appreciate the confidence, but I'm not sure if I have the skills to do this.
That's why we're here for @djlacombe -- answer your questions and guide you along the way. Unless @aseyboldt thinks it's too complex for a beginner-level issue
I am also struggling with this issue. I am using the nutpie sampler and I would like to perform some model comparison. Is this issue mentioned
Sounds like we should add the elemwise logp values as a deterministics (based on an argument to the compilation function). Then we can sort the results into the correct arviz section after sampling.
I am more of a user and feeling like I'm in the same boat as @djlacombe in terms of having the right skills. Is the above issue regarding this function:
https://github.com/pymc-devs/nutpie/blob/3328f64c9c4fe415688e53867227bb1024409fd9/python/nutpie/compile_pymc.py#L447
I am not sure what it means to add as deterministic outside of the model declaration.
TBH I am not really sure where to begin with this issue but I would love to be able to perform model comparison sometime soon.
@jabrantley You should just be able to compute the elementwise density with this right now:
with model:
pm.compute_log_likelihood(trace)
Or is this not working for you somehow? The issue here is more that it would be convenient if nutpie had an option to compute this while sampling, but nothing should be stopping you from running model comparison as it is.
@aseyboldt
I think the issue might be related to this working in Bambi, i.e., using nutpie within Bambi and it returning the logp values.
I'm willing to work on this if I can get a bit of direction.
I think with bambi this should do the trick: https://bambinos.github.io/bambi/api/Model.html#bambi.Model.compute_log_likelihood
As to an implementation in nutpie itself:
We could add an argument compute_log_likelihood to nutpie.compile_pymc_model. That can pass that info on to _make_functions. (This might be one of the worst parts of the nutpie codebase, sorry about that, and if you want to clean that up a bit while adding this feature I definitely wouldn't mind).
This function compiles to functions: the logp function, which we don't care about here, and the expand function, which take a position on the unconstrained space and returns all the individual variables on the constrained space and all the deterministics. We want to add the log likelihoods there as additional variables.
The sampler will call that function once for each draw it produces, and write the results into the trace object.
We then need to add some logic here to sort the log_likelihood into the correct group in arviz.
Ah yes, @aseyboldt that does work. Sorry, I did not see that in @AlexAndorra's original post. I have always passed it through idata_kwargs so I didn't even think to compute it directly . Thanks!
No worries :-)