posteriordb icon indicating copy to clipboard operation
posteriordb copied to clipboard

Add structure for likelihood to compute predictive distributions based on draws

Open MansMeg opened this issue 4 years ago • 13 comments

In model slot.

MansMeg avatar Dec 04 '19 18:12 MansMeg

Would this be an additional slot in the model info file like blr.info.json or in a model code file like blr.stan?

eerolinna avatar Dec 12 '19 14:12 eerolinna

Im actually starting to think of this as a separate stan function for now (a separate stan file). Then we can add it along the way.

MansMeg avatar Dec 14 '19 10:12 MansMeg

So you're thinking that blr.info.json would be something like this?

{
  "name": "blr",
  "keywords": [],
  "title": "A Bayesian linear regression model with vague priors",
  "description": "A Bayesian linear regression model with vague priors.",
  "urls": [],
  "model_code": {
    "stan": "models/stan/blr.stan"
  },
  "likelihood_code": {
    "stan": "models/stan/blr_likelihood.stan" 
  },
  "references": null,
  "added_date": "2019-11-29",
  "added_by": "Mans Magnusson"
}

eerolinna avatar Dec 14 '19 14:12 eerolinna

Yes. Exactly!

MansMeg avatar Dec 14 '19 15:12 MansMeg

With PyMC there is no need for any extra code to compute predictive distributions

posterior_predictive_samples = pymc.sample_posterior_predictive(draws, model=model)

where model is any pymc model and draws are the posterior draws computed with pymc. Source: https://docs.pymc.io/notebooks/posterior_predictive.html

It seems backwards to include likelihood_code in the model info file when not all model implementations will require it.

Ideal solution

From my perspective the ideal solution seems like this

  1. change blr.info.json to this
{
  "name": "blr",
  "model_implementations": {
    "stan": "models/stan/blr.info.json",
    "pymc": "models/pymc/blr.info.json"
  }
}

(some slots like keywords have been omitted for the sake of clarity, but they still would remain here)

  1. models/stan/blr.info.json would be
{
  "code": "models/stan/blr.stan",
  "likelihood": "models/stan/blr_likelihood.stan"
}
  1. models/pymc/blr.info.json would be
{
  "code": "models/pymc/blr.py"
}

eerolinna avatar Dec 15 '19 18:12 eerolinna

I agree that this is probably the right way to go. Well spotted.

MansMeg avatar Dec 16 '19 07:12 MansMeg

How should we expose the likelihood/predictive distribution to users?

In other words, if I have a posterior object po <- posterior("8_schools"), what would the API be like to access the likelihood/predictive distribution?

Would it be something like stan_predictive_draws(po, posterior_draws)? Would we also want something like stan_likelihood(po, posterior_draws)? What would this return?

These could be generalized to predictive_draws(po, posterior_draws, framework = "stan") and the same for stan_likelihood.

eerolinna avatar Dec 17 '19 17:12 eerolinna

Yes, that would probably be a good idea

MansMeg avatar Dec 18 '19 07:12 MansMeg

Write a suggestion for the 8-schools example of how it should look and review it with Paul.

The structure change suggested by @eerolinna needs to be done before @jarnefeltoliver starts to fill out the db.

MansMeg avatar Dec 19 '19 11:12 MansMeg

I have now implemented it, but I think it is simpler to just add a JSON object directly:

{
  "name": "blr",
  "model_implementations": {
    "stan": {
         "model_code": "models/stan/blr.stan",
         "likelihood_code": "models/stan/blr_likelihood.stan"}
    "pymc": {
          "model_code": "models/pymc/blr.py"
     }
  }
}

MansMeg avatar Jan 03 '20 11:01 MansMeg

That's a good idea!

eerolinna avatar Jan 03 '20 12:01 eerolinna

Do we want to have an API that exposes the likelihood code? Something like

stan_likelihood_code_file(po)

that would for the blr posterior return models/stan/blr_likelihood.stan.

Or is it sufficient that we have something like

posterior_predictive_draws(po, framework = "stan")

that essentially keeps the likelihood code file as an implementation detail instead of public API?

Arguments for keeping the likelihood code file out of the public API

  • Smaller API surface is easier to learn, understand and maintain
  • If Stan ever adds a way to automatically compute predictive draws without a separate likelihood definition we can remove the likelihood code files without breaking anyone's code

Arguments for adding the likelihood code to the public API

  • Someone might need the likelihood code outside of computing predictive draws?

eerolinna avatar Jan 03 '20 16:01 eerolinna

We would like to be able to produce the predictive distribution using stan likelihood file without any restrictions on how the posterior was computed. So in R we would have a function that uses the likelihood file but returns a predictive distribution. Although this does not necessarily need to expose the likelihood code as you say. I . have no good answer yet.

MansMeg avatar Jan 06 '20 09:01 MansMeg