insight
insight copied to clipboard
Validate `get_variance()` against remaining families
As a follow-up of #877
Following families need validation or don't yet work:
- [ ] betabinomial
- [ ] hurdle models
- [ ] zero-inflated models
- [ ] glmmTMB::compois
- [ ] glmmTMB::genpois
- [ ] glmmTMB::lognormal - these models have low fixed effects variance, leading to R2 close to 0. Calculation needs revision here?
Tagging @bbolker FYI (also tagging @bwiernik )
-
How could we possibly validate genpois and compois families? Are there any related families that would return a similar R2 so we have a reference for validating the code?
-
How to calculate the distribution-specific variance for zero-inflated models? (or at least: what's more accurate, see https://github.com/easystats/insight/pull/893#issuecomment-2175532919
-
Is it expected that the R2 is lower for zero-inflated models, when the full model (zero-inflation and conditional part) is taken into account (as opposed to the conditional component of zero-inflated models only, see following example)?
Regarding Zero-Inflation models
Formerly the dispersion parameter for Poisson and ZI Poisson was set to 1. Now, the behaviour for ZI Poisson only has changed, returning a different dispersion / variance:
# For zero-inflated poisson models, the
# distributional variance is based on Zuur et al. 2012
# ----------------------------------------------
.variance_zip <- function(model, faminfo, family_var) {
if (inherits(model, "glmmTMB")) {
p <- stats::predict(model, type = "zprob")
mu <- stats::predict(model, type = "conditional")
pvar <- (1 - p) * (mu + p * mu^2)
} else if (inherits(model, "MixMod")) {
p <- stats::plogis(stats::predict(model, type_pred = "link", type = "zero_part"))
mu <- suppressWarnings(stats::predict(model, type = "mean_subject"))
pvar <- (1 - p) * (mu + p * mu^2)
} else {
pvar <- family_var
}
mean(pvar)
}
Taking following model, for this particular example, this comes closer to a Bayesian model than setting sigma/dispersion to 1 (for zero-inflation models!)
m <- glmmTMB::glmmTMB(count ~ mined + cover + (1 + cover | site),
ziformula = ~mined,
family = poisson(), data = Salamanders
)
performance::r2_nakagawa(m)
Formerly with sigma/dispersion = 1
# R2 for Mixed Models
Conditional R2: 0.650
Marginal R2: 0.525
Now
# R2 for Mixed Models
Conditional R2: 0.414
Marginal R2: 0.334
brms returns (marginal R2 only)
library(brms)
m2 <- brms::brm(bf(count ~ mined + (1 | site), zi ~mined),
family = brms::zero_inflated_poisson(), data = Salamanders, backend = "rstan"
)
brms::bayes_R2(m2)
Estimate Est.Error Q2.5 Q97.5
R2 0.1686378 0.01874732 0.1334217 0.2070608
Any ideas how to validate the results? @bbolker @bwiernik ?
We have following deviation to MuMIn:
- Computation of fixed effects variance: https://github.com/easystats/insight/issues/877#issuecomment-2163052765
- Computation of distribution-specific variance for negative binomial models (negbin/nbinom1): https://github.com/easystats/insight/issues/877#issuecomment-2165551707
- Computation of Sigma for glmmTMB: https://github.com/easystats/insight/issues/877#issuecomment-2165696977
However, the things we do differently are in line with the code in the Supplement 2 of Nakagawa. Thus, I would say we go with the current implementation of get_variance(). I added lots of tests to validate against external evidence of correct results.
@bbolker null_model() probably doesn't calculate the correct null model for zero-inflation models. What would be the correct null model in the next example?
data(fish, package = "insight")
m1 <- glmmTMB::glmmTMB(
count ~ child + camper + (1 | persons),
ziformula = ~ child + camper + (1 | persons),
data = fish,
family = poisson()
)
summary(insight::null_model(m1))
#> Family: poisson ( log )
#> Formula: count ~ (1 | persons)
#> Zero inflation: ~child + camper + (1 | persons)
#> Data: fish
#>
#> AIC BIC logLik deviance df.resid
#> 1808.7 1829.8 -898.3 1796.7 244
#>
#> Random effects:
#>
#> Conditional model:
#> Groups Name Variance Std.Dev.
#> persons (Intercept) 0.6808 0.8251
#> Number of obs: 250, groups: persons, 4
#>
#> Zero-inflation model:
#> Groups Name Variance Std.Dev.
#> persons (Intercept) 1.069 1.034
#> Number of obs: 250, groups: persons, 4
#>
#> Conditional model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.5875 0.4173 3.804 0.000142 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Zero-inflation model:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4988 0.5949 -0.839 0.40172
#> child 2.0850 0.3212 6.491 8.52e-11 ***
#> camper1 -1.0863 0.3447 -3.151 0.00163 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2024-06-14 with reprex v2.1.0
Presumably it would be
formula = count ~ 1 + (1 | persons),
ziformula = ~1 + (1 | persons)
?