posteriordb Add structure for likelihood to compute predictive distributions based on draws

In model slot.

Dec 04 '19 18:12 MansMeg

Would this be an additional slot in the model info file like blr.info.json or in a model code file like blr.stan?

Dec 12 '19 14:12 eerolinna

Im actually starting to think of this as a separate stan function for now (a separate stan file). Then we can add it along the way.

Dec 14 '19 10:12 MansMeg

So you're thinking that blr.info.json would be something like this?

{
  "name": "blr",
  "keywords": [],
  "title": "A Bayesian linear regression model with vague priors",
  "description": "A Bayesian linear regression model with vague priors.",
  "urls": [],
  "model_code": {
    "stan": "models/stan/blr.stan"
  },
  "likelihood_code": {
    "stan": "models/stan/blr_likelihood.stan" 
  },
  "references": null,
  "added_date": "2019-11-29",
  "added_by": "Mans Magnusson"
}

Dec 14 '19 14:12 eerolinna

Yes. Exactly!

Dec 14 '19 15:12 MansMeg

With PyMC there is no need for any extra code to compute predictive distributions

posterior_predictive_samples = pymc.sample_posterior_predictive(draws, model=model)

where model is any pymc model and draws are the posterior draws computed with pymc. Source: https://docs.pymc.io/notebooks/posterior_predictive.html

It seems backwards to include likelihood_code in the model info file when not all model implementations will require it.

Ideal solution

From my perspective the ideal solution seems like this

change blr.info.json to this

{
  "name": "blr",
  "model_implementations": {
    "stan": "models/stan/blr.info.json",
    "pymc": "models/pymc/blr.info.json"
  }
}

(some slots like keywords have been omitted for the sake of clarity, but they still would remain here)

models/stan/blr.info.json would be

{
  "code": "models/stan/blr.stan",
  "likelihood": "models/stan/blr_likelihood.stan"
}

models/pymc/blr.info.json would be

{
  "code": "models/pymc/blr.py"
}

Dec 15 '19 18:12 eerolinna

I agree that this is probably the right way to go. Well spotted.

Dec 16 '19 07:12 MansMeg

How should we expose the likelihood/predictive distribution to users?

In other words, if I have a posterior object po <- posterior("8_schools"), what would the API be like to access the likelihood/predictive distribution?

Would it be something like stan_predictive_draws(po, posterior_draws)? Would we also want something like stan_likelihood(po, posterior_draws)? What would this return?

These could be generalized to predictive_draws(po, posterior_draws, framework = "stan") and the same for stan_likelihood.

Dec 17 '19 17:12 eerolinna

Yes, that would probably be a good idea

Dec 18 '19 07:12 MansMeg

Write a suggestion for the 8-schools example of how it should look and review it with Paul.

The structure change suggested by @eerolinna needs to be done before @jarnefeltoliver starts to fill out the db.

Dec 19 '19 11:12 MansMeg

I have now implemented it, but I think it is simpler to just add a JSON object directly:

{
  "name": "blr",
  "model_implementations": {
    "stan": {
         "model_code": "models/stan/blr.stan",
         "likelihood_code": "models/stan/blr_likelihood.stan"}
    "pymc": {
          "model_code": "models/pymc/blr.py"
     }
  }
}

Jan 03 '20 11:01 MansMeg

That's a good idea!

Jan 03 '20 12:01 eerolinna

Do we want to have an API that exposes the likelihood code? Something like

stan_likelihood_code_file(po)

that would for the blr posterior return models/stan/blr_likelihood.stan.

Or is it sufficient that we have something like

posterior_predictive_draws(po, framework = "stan")

that essentially keeps the likelihood code file as an implementation detail instead of public API?

Arguments for keeping the likelihood code file out of the public API

Smaller API surface is easier to learn, understand and maintain
If Stan ever adds a way to automatically compute predictive draws without a separate likelihood definition we can remove the likelihood code files without breaking anyone's code

Arguments for adding the likelihood code to the public API

Someone might need the likelihood code outside of computing predictive draws?

Jan 03 '20 16:01 eerolinna

We would like to be able to produce the predictive distribution using stan likelihood file without any restrictions on how the posterior was computed. So in R we would have a function that uses the likelihood file but returns a predictive distribution. Although this does not necessarily need to expose the likelihood code as you say. I . have no good answer yet.

Jan 06 '20 09:01 MansMeg

posteriordb posteriordb copied to clipboard

Add structure for likelihood to compute predictive distributions based on draws

Ideal solution

posteriordb
posteriordb copied to clipboard