insight icon indicating copy to clipboard operation
insight copied to clipboard

Differences between get_predicted and get_predicted_ci for mixed models

Open fhui28 opened this issue 2 years ago • 3 comments

This is something that came up during a visit to @bbolker with @emitanaka. Basically, for (G)LMMs, the standard errors returned by get_predicted and get_predicted_ci differ.

An example for LMMs first:

library(tidyverse)
library(lme4)
library(glmmTMB)
library(insight)

m1 <- lmer(Petal.Width ~ Sepal.Length + (1 | Species), iris)
get_predicted(m1, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m1)
get_predicted_ci(m1, predictions) %>% as.data.frame() %>% head 
#' Matches. Currently, both of these standard errors are actually wrong because they 
#' do not account for the uncertainty in the predicted random effects. 
#' This is an lme4 thing though, and Ben Bolker among others is aware of this. 


m2 <- glmmTMB(Petal.Width ~ Sepal.Length + (1 | Species), iris, REML = TRUE)
get_predicted(m2, ci = 0.95) %>% as.data.frame() %>% head 
predictions <- predict(m2)
get_predicted_ci(m2, predictions) %>% as.data.frame() %>% head 
#' Does not match. The former gets the TMB prediction error,  which is better. 
#' But the latter reverts back to the lme4 incorrect prediction error. 

For the issue above, and with respect to the insights package, my bigger concern is the inconsistency in modem m2. As mentioned, the issue why the prediction errors differ between lmer and glmmTMB is something @bbolker is already aware of; see this github issue from glmmTMB

Things gets a bit weirder for GLMMs though, and this is where I will throw my hands up in the air (even higher!)

m1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial)
get_predicted(m1, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m1, type = "response")
get_predicted_ci(m1, predictions) %>% as.data.frame() %>% head 
#' These two do not match (the latter is conisderably larger); not sure why.


m2 <- glmmTMB(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial)
get_predicted(m2, ci = 0.95) %>% as.data.frame() %>% head
predictions <- predict(m2, type = "response")
get_predicted_ci(m2, predictions) %>% as.data.frame() %>% head 
#' These two also do not match (the latter is conisderably larger); not sure why. 
#' Also, note the latter gets very similar results to get_predicted_ci(m1, predictions), 
#' while the former is not the same as get_predicted(m1, ci = 0.95)

fhui28 avatar Aug 16 '23 00:08 fhui28

get_predicted_ci() is mostly (internally) used for those model classes that do not return any standard errors for predictions (i.e. when there's nothing like se.fit = TRUE). This also applies to merMod-models. For these models, we use something like:

M <- model.matrix(m1)
V <- vcov(m1)
se <- sqrt(diag(M %*% V %*% t(M)))

based on https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#lme4. Thus, when calling get_predicted_ci(), standard errors are always based on the above formula. get_predicted() returns standard errors from predict() where possible. That's why get_predicted.glmmTMB() relies on standard errors returned by predict.glmmTMB(), but when calling get_predicted_ci() - even for a glmmTMB model - the above formula is used. That explains the difference between standard errors for glmmTMB models returned by get_predicted() or get_predicted_ci().

I think this should be clearly documented, and that get_predicted() is preferred (we may even hide get_predicted_ci() from the users?) in order to get SE and CI.

strengejacke avatar Sep 26 '23 20:09 strengejacke

Tagging @bbolker just FYI.

strengejacke avatar Sep 26 '23 20:09 strengejacke

Not sure if the documentation I added is sufficient, so I'll keep this issue open.

strengejacke avatar Sep 27 '23 11:09 strengejacke