performance icon indicating copy to clipboard operation
performance copied to clipboard

check_overdispersion for binomial models

Open richardjtelford opened this issue 2 months ago • 2 comments

I've been having a problem with using check_overdispersion() on a binomial glm. The dataset is quite small and overdispersion is expected (due to the impact of weather etc), but check_overdispersion() reports no overdispersion.

I asked at DHARMa

https://github.com/florianhartig/DHARMa/issues/502

I was told that the DHARMa default dispersion test would have low power, and I should instead use the Pearson Chi-squared test instead.

What is the reason for check_overdispersion with a binomial GLM using DHARMa's dispersion test with simulated data rather than the Pearson Chi-squared test (which I think is being used for Poisson GLMs)? Would it be possible to give users a choice?


Incidentally, trying to plot the result of check_overdispersion() with a binomial glm gives an error

performance::check_overdispersion(mod1) |> plot()
# Error in .model_diagnostic_overdispersion(model) : object 'd' not found

I think the problem is that the code does not reject unsupported models.

richardjtelford avatar Oct 23 '25 18:10 richardjtelford

Can you give a reproducible example? Note that Pearson tests for binomial require multiple trials per observation and are not applicable for Bernoulli/single trial models

bwiernik avatar Oct 23 '25 19:10 bwiernik

dat <- structure(list(n = c(55, 59, 74, 7, 54, 54, 57, 48, 55, 57, 41, 
20, 13, 21, 13, 32, 38, 42, 37, 14, 19), success = c(26, 35, 
28, 6, 16, 35, 28, 21, 31, 10, 2, 9, 7, 18, 1, 26, 28, 3, 17, 
11, 17), x = c(3.464, 1.599, 3.39, 3.047, 2.442, 1.777, 3.363, 
4.626, 2.701, 4.636, 3.622, 2.031, 1.666, 2.218, 3.338, 4.255, 
2.476, 4.727, 3.317, 3.925, 2.854)), row.names = c(NA, -21L), class = "data.frame")

mod1 <- glm(cbind(success, n - success) ~ x, data = dat, family = binomial)

anova(mod1) # residual deviance >> residual df
#> Analysis of Deviance Table
#> 
#> Model: binomial, link: logit
#> 
#> Response: cbind(success, n - success)
#> 
#> Terms added sequentially (first to last)
#> 
#> 
#>      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
#> NULL                    20     189.56              
#> x     1   39.423        19     150.13 3.413e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
performance::check_overdispersion(mod1)
#> # Overdispersion test
#> 
#>  dispersion ratio = 1.083
#>           p-value =  0.48
#> No overdispersion detected.

Created on 2025-10-23 with reprex v2.1.1

richardjtelford avatar Oct 23 '25 19:10 richardjtelford