insight icon indicating copy to clipboard operation
insight copied to clipboard

get_predicted(): features and caveats

Open DominiqueMakowski opened this issue 4 years ago • 18 comments
trafficstars

Here's a list of all features / issues etc., to have a global view and avoid opening multiple issues.

  • [x] API Design (https://github.com/easystats/insight/issues/310)
  • [x] Fix for "log()" terms in formula (https://github.com/easystats/insight/issues/314)
  • [x] Multiple CIs (https://github.com/easystats/insight/issues/313)
  • [x] Prediction intervals for Bayesian GLMs (https://github.com/easystats/insight/issues/310#issuecomment-792276163)
  • [x] Prediction intervals for glms
    • [x] Binomial: https://fromthebottomoftheheap.net/2017/05/01/glm-prediction-intervals-i/
    • [x] Poisson: https://fromthebottomoftheheap.net/2017/05/01/glm-prediction-intervals-ii/
    • Others?
  • [x] Correct intervals for lme4 (https://github.com/easystats/insight/issues/298)
  • [ ] Prediction intervals for glmmTMB: Also, why are confidence intervals so different in respect to lme4 (could be related to issue above)
  • [ ] Make user-life easier and create valid data grid / reference grid for data argument, when holding constant non-focal co-variates (https://github.com/easystats/insight/issues/316)
  • [ ] ~Prediction~ Confidence intervals for GAMs <edit by Mattan: these are CIs, not PIs>
    • https://fromthebottomoftheheap.net/2016/12/15/simultaneous-interval-revisited/
    • https://github.com/gavinsimpson/gratia/blob/master/R/confint-methods.R
    • https://github.com/gavinsimpson/gratia/blob/master/R/posterior-samples.R
  • [ ] @strengejacke @mattansb is it possible to get PIs for GAMs? Can we use the same method that for lm/glms?
  • [ ] predict = "observation", transformed output to match shape of observations (i.e., Zeros and Ones for logistic)
  • [ ] Make prediction robust to missing data in newdata; must first filter newdata, make predictions, and then re-add rows that have missing values with missing predictions (must do that also for draws, ci_vals)
  • [ ] Support for other models

DominiqueMakowski avatar Mar 08 '21 02:03 DominiqueMakowski

Do you think this function should ultimately be assigned to its own package?

A package like prediction, predicted, etc. (not sure which names are available on CRAN).

I would also recommend having an alias for this function called model_predictions.

This would mean the ecosystem would have a consistent function naming schemas and also mirror functions for broom functions:

broom::tidy     <->  parameters::model_parameters
broom::glance   <->  performance::model_performance
broom::augment  <->  prediction::model_prediction

What do you think? cc @strengejacke, @mattansb

IndrajeetPatil avatar Mar 08 '21 13:03 IndrajeetPatil

Well, we have the modelbased package. I think @DominiqueMakowski just wanted to put the workhorse into insight, and then the "visible" user function will be in modelbased.

strengejacke avatar Mar 08 '21 13:03 strengejacke

the "visible" user function will be in modelbased

I see, gotcha!

What do you think about calling this function model_prediction?

IndrajeetPatil avatar Mar 08 '21 13:03 IndrajeetPatil

Yes correct, the idea was to have the heavy lifting done at the insight level, mostly cos it is useful elsewhere (performance indices for instance or such). Then, modelbased will be a user-facing package focusing mostly on visualization of models (including visualization of links, contrasts, marginal means etc). I plan to add nice vignettes to modelbased dicsussing a model-based approach to stats, how to make inference and derive indices from models etc.

the function are currently named estimate_link and estimate_response but given the new API of get_predicted I might change to something like estimate_relationship() and estimate_response() or something like this

DominiqueMakowski avatar Mar 08 '21 13:03 DominiqueMakowski

But we would still use both emmeans and predict, right?

strengejacke avatar Mar 08 '21 13:03 strengejacke

yes yes for estimate_means and estimate_contrasts as the internals of that are definitly beyond us

DominiqueMakowski avatar Mar 08 '21 13:03 DominiqueMakowski

Not as option for estimate_response? See https://strengejacke.github.io/ggeffects/articles/technical_differencepredictemmeans.html

strengejacke avatar Mar 08 '21 13:03 strengejacke

mmh we'll see, but I feel like for modelbased we could really merge your and indra's experience with model visualization to make something really neat and powerful.

DominiqueMakowski avatar Mar 08 '21 13:03 DominiqueMakowski

Support for other models, here I list some special cases:

  • logistf: https://github.com/strengejacke/ggeffects/blob/master/R/get_predictions_logistf.R
  • bamlss: https://github.com/strengejacke/ggeffects/blob/master/R/get_predictions_bamlss.R
  • clm (and other ordinal / cumulative link models): https://github.com/strengejacke/ggeffects/blob/master/R/get_predictions_clm.R (these are great, you have to deal with the different outcome levels)
  • lrm, multinom, ols, polr, ...: https://github.com/strengejacke/ggeffects/blob/master/R/get_predictions_lrm.R (different names for "type" argument)
  • VGAM: https://github.com/strengejacke/ggeffects/blob/master/R/get_predictions_vglm.R (different name for "predict"?)

strengejacke avatar Mar 09 '21 08:03 strengejacke

"clmm" is only supported by emmeans, has not predict.

strengejacke avatar Mar 09 '21 08:03 strengejacke

I've added prediction intervals for Binomial and Poisson models:

library(insight)
library(magrittr)

glm(am ~ cyl + mpg + hp, data = mtcars,
    family = binomial()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4         0.1762632      0       1
#> Mazda RX4 Wag     0.1762632      0       1
#> Datsun 710        0.6499998      0       1
#> Hornet 4 Drive    0.2489592      0       1
#> Hornet Sportabout 0.2268359      0       1
#> Valiant           0.0064952      0       0

glm(gear ~ ., data = mtcars,
    family = poisson()) %>% 
  get_predicted(predict = "prediction") %>% 
  as.data.frame() %>% head() %>% zapsmall()
#>                   Predicted CI_low CI_high
#> Mazda RX4          4.266709      1       9
#> Mazda RX4 Wag      4.192287      1       9
#> Datsun 710         4.031290      1       8
#> Hornet 4 Drive     3.043687      0       7
#> Hornet Sportabout  2.967043      0       7
#> Valiant            2.890863      0       7

Created on 2021-03-10 by the reprex package (v1.0.0)

The Gaussian PIs still work as they use to (:

As I noted somewhere prior, each family has its own PI method, so adding each these analytical solutions is somewhat tedious... It might be easier to simulate population values and get the 95% ETI from there instead.


Also, these columns should really be PI_low and PI_high...

mattansb avatar Mar 10 '21 10:03 mattansb

@mattansb I hope you found the code clear and logical :)

One thing to make sure of:

Following your comments, currently the PIs for Bayesian models are obtained via their bespoke function:

https://github.com/easystats/insight/blob/da33355faf3868616736499f793377e88e2d8d30/R/get_predicted_ci.R#L115

instead of manually computing them from the draws. But does their function does something else than that? Or is it unnecessarily re-sampling draws from the posterior? (note that the iterations were obtained via posterior_predict).

DominiqueMakowski avatar Mar 10 '21 14:03 DominiqueMakowski

instead of manually computing them from the draws. But does their function does something else than that?

Nope - that's exactly what it does:

https://github.com/stan-dev/rstanarm/blob/dee0a2d45bf42b2df791072041151b753edd6af9/R/predictive_interval.R#L82-L91

mattansb avatar Mar 10 '21 14:03 mattansb

thanks for addressing my source-code-checking laziness 😁 - I'll drop that exception then

DominiqueMakowski avatar Mar 10 '21 14:03 DominiqueMakowski

@strengejacke @mattansb is it possible to get PIs for GAMs? Can we use the same method that for lm/glms?

The code I wrote should also work with GLM GAMs. Is there any reason the same won't apply for gaussian models?

mattansb avatar Mar 15 '21 14:03 mattansb

@DominiqueMakowski Currently, some methods like get_predicted.glmmTMB() directly check the predict argument, while others like get_predicted.lmerMod() call .get_predicted_args() to check the input. This leads to the situation that we have methods that have predict = c("expectation", "link", "prediction", "response", "relation") in their usage, while others just have predict = "expectation". Can this be harmonized?

strengejacke avatar Jul 26 '21 09:07 strengejacke

See https://github.com/easystats/insight/commit/4aa8e44bb22980857a4ec24e47a984b5367da4e2, else https://github.com/easystats/insight/blob/4aa8e44bb22980857a4ec24e47a984b5367da4e2/tests/testthat/test-get_predicted.R#L150 fails

strengejacke avatar Jul 26 '21 09:07 strengejacke

Prediction intervals for glmmTMB: Also, why are confidence intervals so different in respect to lme4 (could be related to issue above)

Note that by default, glmmTMB uses Wald CIs on fixed effects/log variance components, whereas lme4 uses profile likelihood CIs on fixed effects/variance components

bwiernik avatar Jul 26 '21 15:07 bwiernik