DHARMa Update dispersion test recommendations

Update dispersion test recommendations

Open florianhartig opened this issue 3 years ago • 13 comments

Dispersion test recommendations in the vignette are outdated and should be updated

Hi Florian,

i guess 3 weeks counts as soonish? Well, everyone's lives are difficult right now and i've wasted a substantial amount of time planning a vacation that never happened. Anyway, enough of the chatter.

The problems with the current approach

I think i have identified the major issues with the current approach of comparing the standard deviation of y_observed-y_predicted with the standard deviations of y_simulated-y_predicted.

A very minor point: Overdispersion seems to usually be expressed in terms of observed variance/theoretical variance. That's just the square of what is the current test-statistic and has no impact on p-values, but it makes it easier to talk about and allows for easier comparison of different methods.
The big one: Overdispersion is inherently a property of conditional residuals. To illustrate the point, for a mixed Poisson model the assumption is that y_i ~ Pois(lambda_i = inverse link-function(fixed effect + random effect)) and therefore has variance lambda. This necessarily needs the control for the random effect. In fact overdispersion could very well be caused by a missing control for a random effect. Now what happens when you leave the random effects in you get something like:

(theoretical var of y because of FE * overdispersion+ theoretical var of y because of RE ) /(theoretical var of y because of FE + theoretical var of y because of RE )

This will be close to 1 if the RE are large in comparison to the overdispersion, which would make the test very under powered and renders the test-statistic uninformative. I think you've seen that in multiple issues already.

The medium one: The current approach ignores the inherent heteroscedacty of e.g. Poisson.models. If y_i has a larger lambda_i than all others, then the theoretocal variance of y_i is lambda_i and the corresponding simulations will have more variance and might dominate the overall estimate. Especially since the default link-function is log, so the lambdas are exp(linear predictor), which can have huge spread.

Now to actually show this, i had to construct some fairly extreme data, where the 10% largest lambdas ( and they are way larger) have no overdispersion, while the rest are substantially overdispersed:

require(DHARMa)
require(lme4)
require(performance)
n <- 10000
x <- rnorm(n) 
# first 1000 have very large lambdas 
x2 <- ifelse(1:n <= 1000, 10, 0)
f <- as.factor(rep(1:10, each=  n/10))
rf <- rnorm(10, sd = 0.1)
# first 1000 have no overdispersion
overdisp <- c(rep(0, 1000), rnorm(n-1000, sd = 0.15))
#plot(x + x2, overdisp)
d <- data.frame(y=rpois(n, lambda= exp(5 + x + x2+ rf[f] + overdisp)),
                x= x,
                f= f)
loc_model <- glmer(y ~ x + x2 + (1|f), data=d, family=poisson)
summary(loc_model)
res_conditional <- simulateResiduals(loc_model, re.form = NULL)
plot(res_conditional)
testDispersion(res_conditional)
check_overdispersion(loc_model) # ratios is at about 6

Of course the other tests pick up on the massive issue, but i think this distorts the results for large linear predictors and link = log, since then the spread of the lambdas becomes larger.

alternative approaches

The obvious one is check_overdispersion from the performance package. That one treats the sum of squared Pearson residuals as chi-squared distributed. Now for my idea: If you look at the formula for Pearson residuals(https://stats.stackexchange.com/questions/45479/pearsons-residuals) you see (x- mu)/sigma, which is also a Z-transformation. Not that the result is necessarily standard normal, but within DHARMa we can generate standard normal residuals by inverse CDF and sensible outlier-value, e.g. the E[Z | Z = maximum of nSim+1 standard normal variables](see here: https://math.stackexchange.com/questions/473229/expected-value-for-maximum-of-n-normal-random-variable).

It's getting late, so i will post my comparison in their performance tomorrow, but in short: my idea works ok, but check_overdispersion apparently works at least as well, if not better, in every tested situation.

till then all the best, Lukas

Oct 24 '20 22:10 Istalan

Sorry for the delay. I spent some more time thinking and i think i figured out, why my test idea is slightly anti-conservative. Since the PIT is performed with an imperfect CDF the resulting distribution isn't perfectly uniform and therefore the result is a slightly higher variance(1.015, roughly on medium lambdas), which is almost always significant for high n (e.g 20,000). The KS-test actually has the same issue for n at 50,000. Admittedly i only ran that test twice, since my real computer is in the shop and my laptop is good but not great. But in terms of real usage perhaps this test is too sensitive where KS is not.

Here's the Code: fin_dispersion (copy).txt

Some explanations about my code in the attachment: First i calculate the outlier value, then i define a one-sided test for overdispersion, since check_overdispersion is also one-sided. Then i combine the three tests to test on a not overdispersed model, with relatively little variance in random and fixed effects and then i run simulations for three different intercepts:

large, lambdas around exp(10)
medium, lambdas around exp(1)
small, lambdas around exp(-2)

The number of simulations and the n's are chosen to keep run times on my laptop at 1-2 minutes and scenarios 2, 3 could perhaps use more of both. The n's to avoid singular model fits, but a small test i did seemed to show, that the singular models do not distort the results.

On large lambdas my idea works quite well, but on medium lambdas P(p<0.05) is at about 0.08. Also there is strange behavior for small lambdas where the p-values are almost normally distributed. I don't know why. Also while all results are usually very closely correlated, here the main source of randomness in the new approach is the jiggling in the PIT, as most values in both observed response and simulations are 0.

The old test creates a similar bell shape for large and small lambdas, but it does surprisingly well for the medium case. Although sometimes a slight bell shape is visible even there.

Now check_overdispersion unsurprisingly does perfectly for large lambdas, but very surprising to me it becomes a bit more conservative for smaller lambdas. I know that i have claimed the opposite in the past. Apparently i made a huge mistake calculating with the actual lambdas instead of the sample mean, because dividing by that estimate in the squared pearson-residual flattens the extreme results (the ones with extreme means) a lot. A good reminder to consume all my statements well salted:).

What to implement?

The only points that speak against are that a) it's not perfect b) it needs residuals(fittedModel, type = "pearson") implemented and c) it doesn't use any DHARMa-features. Point c) isn't that important, but it is a little sad. Perhaps my idea could be properly calibrated by using boot-strap? The bias is parameter dependent, so a blanket correction, like shrinkage factor based on nSim, won't do. But regardless any implementation based of DHARMa's quantileresiduals would first require something like a require.DHARMa.conditional() function.

Tell me what you think and i'll get back to you.

Oct 26 '20 22:10 Istalan

Hi Lukas,

sorry for the late reply, I am currently a bit overwhelmed with work. Thanks so much for the thoughts!

I agree with (and was aware about) issue 2, and I would file 3 as a sub-issue of that. In the end, this all arises because I currently measure dispersion on the response, and not on the residuals. The current test was just a quick fix to get something going at the time I was implementing this, but then I never found the time to improve this. I don't have a perfect solution, but here are few initial thoughts:

If possible in any way, I would like to have a test that is based on the simulations only, without requiring Pearson residuals or anything like that. The issue with Pearson residuals is that they are not supported by all models, and this is probably even more a problem for more advanced packages that I plan to add to DHARMa in the future. However, possibly we can emulate Pearson residuals via the simulations.
Regarding the performance, I would note that I would of course like to get close to check_overdispersion, but I don't think it's crucial that we achieve the same power. The more important goals would be to a) achieve a reasonable overall power, and b) get a interpretable dispersion parameter (as you noted already)
Regarding your current results: note that the current test often behaves very differently (and typically far more sensitive) when switching to conditional simulations, see e.g. https://github.com/florianhartig/DHARMa/issues/207#issuecomment-702614970

OK, but those were just initial thoughts - I have to look into these things in more detail in the next days.

Oct 27 '20 16:10 florianhartig

Hi,

no worries, take your time. I know i did and i won't have much time this weekend anyway.

Just a short response to your initial thoughts. Issue 2 and 3 are, in my opinion, very much separate issues. If you look at my example for 3, i already use re.form = NULL.

Also something i haven't said explicitly, but maybe should have: With re.form = NULL the tests do produce esentially the same test-statistic. Up to being squared and less so for my idea on small lambdas, but still:

testData = createData(sampleSize = 500, family = poisson(), randomEffectVariance = 0, overdispersion = 2)
m2 <- glm(observedResponse ~ Environment1, 
          family = "poisson", data = testData)

res <- simulateResiduals(m2, plot = T)
test_old <- testDispersion(res)
test_old$statistic^2
check_overdispersion(m2)
exp(4) # for intercept 0, the ratio seems to be exp(2*overdispersion), because of the link-function, i guess

Also keep in mind, that random effects can't be over-dispersed as their variance is estimated separately, so including their variance in a measure of over-dispersion dilutes the result.

I hope this clarifies my thinking.

Oct 28 '20 14:10 Istalan

Hi Lukas,

OK, thanks for your email, will respond to you in a sek, but wanted to first recapitulate where we are here. I have read through this again, here my current thoughts:

whether to express overdispersion by variance or sd: this is a good point, I'm open to just changing the test statistic. I find sd more "geometric", but I guess variance is more proportional to likelihood for most models, and probably more in line with the traditional definition ...
OK, on re-reading, I agree those are different points. Let's start with 2: Yes, I was aware of this problem, and I agree that re-simulating the REs may be "diluting" the dispersion test. I had noted this in https://github.com/florianhartig/DHARMa/issues/149. What's not clear to me is: to you think that re.form = NULL solves the problem, or do you think the problem remains with that?
This, plus following conversation: I would summarise that this is all about defining dispersion specific per data point, as opposed to "across all data", as it is currently, right? I agree that this is preferable, and had thought about this before, but hadn't come to a good solution.

OK, the problem if we want to define dispersion per data point is that we need a reference, and the obvious reference would be the simulated distribution. If the distribution for each data point can be expected to be uniform, we can use this to test directly, or transform to normal to get "simulation-based Pearson residuals".

Question now is if it's really not uniform, and what the costs of that would be. When I run the following code

testData = createData(sampleSize = 50000, fixedEffects = 1, family = poisson(), randomEffectVariance = 0)
fittedModel <- glm(observedResponse ~ Environment1, family = "poisson", data = testData)

res <- simulateResiduals(fittedModel = fittedModel, n = 100)
plot(res)

I don't see a problem in the KS test. This only seems appears for data size 500.000. OK, maybe not great, but I wonder if this is really a relevant issue in practice, I mean, with 5000 data points certainly all tests will become significant in a practical data analysis anyway.

I feel the more important problem to solve (from my experience with users) would be to supply relevant, interpretable indicator values other than the p-value for how big the problem is.

In this sense, if we could provide a meaningful parameter for the dispersion, with p-values that are approximately exact for data size 5000, that would still be progress.

Feb 04 '21 07:02 florianhartig

So, this is all saying: I think it's a good idea to try defining a dispersion test that uses the simulated dispersion per data point as a reference.

What I don't see, however, is how we can transform to pearson, as the transformation from uniform to normal is problematic, because of the "outliers".

So, we would probably have to test if the dispersion of the residuals is larger than one would expect under uniformity?

Feb 04 '21 07:02 florianhartig

Here some further thoughts.

library(DHARMa)
testData = createData(sampleSize = 200, fixedEffects = 1, family = poisson(), randomEffectVariance = 0)
fittedModel <- glm(observedResponse ~ Environment1, family = "poisson", data = testData)

res <- simulateResiduals(fittedModel = fittedModel, n = 100)
plot(res)

# side note: I could find no inflated type I error of any of the current tests, even for very high sampleSize, as long as I moved n (simulations) to 1000 or higher.

# Option1: dispersion measured by variance of residuals against uniform 

obs = var(res$scaledResiduals) * 12
sim = replicate(1000, {var(runif(200))}) * 12

DHARMa:::getP(simulated = sim, observed = obs, alternative = "two.sided", plot = T)

# Option2: dispersion measured by variance of the normal, which should be something like a simulation-based deviance residual? 
# issue is how to handle the outliers

obs = var(residuals(res, quantileFunction = qnorm, outlierValues = c(-7,7))) 
sim = replicate(1000, {var(rnorm(200))}) # I know we could do this analytical, but I'm lazy
DHARMa:::getP(simulated = sim, observed = obs, alternative = "two.sided", plot = T)

Feb 04 '21 22:02 florianhartig

Hi @Istalan - if you work on this, also note that there is the option with refit = T in testDispersion, which was intended as a non-parametric version of the GLMM Wiki test https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#overdispersion. I suspect that this test is actually working quite OK in most scenarios, the downside is that a) it's slow, and b) it requires Pearson residuals, which are not available for e.g. zi models.

Feb 05 '21 19:02 florianhartig

Just to document the progress on this: This is produced with fc37560232def2f08dafc237900151aa2e0d23e5, testing a number of dispersion test variants

Note that

My conclusion so far:

the default DHARMa tests (now updated with standardized variance instead of sd, to conform to traditional dispersion parameters, but in terms of p-values unchanged) perform surprisingly well, if we use conditional simulations, which are indeed more sensitive than unconditional simulations for GLMMs with a medium to small number of REs
the transformation to a pearson-type simulation-based test (last option) doesn't seem to make a difference, at least in my simulations
none of the suggested simulation-based alternatives (alternatives 1-2) to the current dispersion tests are an improvement
The test of Pearson on chi-2 suffers from the df problem for RE. For the latter reason, I have tested only greater. I find this a major drawback.

Current thoughts / actions:

What seems doable / stable so far is to generate a sd based on simulations, and use this to generate simulation-based pearson residuals, which could be used instead of the raw residuals.
These Pearson residuals could be used in simulation-based tests, or in chi2 tests. The problem with the chi2 tests is that the df for the mixed models is not known, which creates notable problems, in particular for detecting underdispersion.
This leaves simulation-based tests. Here, I couldn't see any differences so far.
Because of the possible division-by-zero issue in the pearson approach for the simulation-based SD, I would therefore move to the Pearson option ONLY if we see an advantage

For all residuals, there seems to be additionally a possibility of underdispersion due REs overfitting the data, but I don't feel that this is a real issue of the tests, we have found this also, e.g., in Roberts, David R., et al. "Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure." Ecography 40.8 (2017): 913-929.

p.s.: to interpret the figure, the tested variants are

  res <- simulateResiduals(fittedModel = fittedModel, n = 250)
  out$DHARMaDefault = testDispersion(res, plot = FALSE)$p.value
  
  res2 <- simulateResiduals(fittedModel = fittedModel, n = 250, re.form = NULL)
  out$DHARMaConditional = testDispersion(res2, plot = FALSE)$p.value
  
  out$DHARMaPearson = testDispersion(res, plot = FALSE, type = "Pearson", alternative = "greater")$p.value
  
  # NEW Option 1: dispersion measured by variance of residuals against uniform 
  obs1 = var(res$scaledResiduals) * 12
  sim1 = replicate(1000, {var(runif(length(res$scaledResiduals)))}) * 12
  out$VarResSimuUni = DHARMa:::getP(simulated = sim1, observed = obs1, plot = F, alternative = "two.sided")
  
  # NEW Option 2: dispersion measured by variance of the normal, which should be something like a simulation-based deviance residual? 
  # issue is how to handle the outliers
  obs2 = var(residuals(res, quantileFunction = qnorm, outlierValues = c(-7,7))) 
  sim2 = replicate(1000, {var(rnorm(length(res$scaledResiduals)))}) # I know we could do this analytical, but I'm lazy
  out$VarResSimuNorm = DHARMa:::getP(simulated = sim2, observed = obs2, plot = F, alternative = "two.sided")
  
  # NEW Option 3 - probably problematic for situations with binomial / possion only 0 / 1 simulations
  res <- simulateResiduals(fittedModel = fittedModel, re.form  = NULL)
  expectedSD = apply(res$simulatedResponse, 1, sd)
  spread <- function(x) sd((x - res$fittedPredictedResponse) / max(expectedSD, 0.001)) 
  out$PearsonSimu = testGeneric(res, summary = spread, plot = F)$p.value

Feb 17 '21 09:02 florianhartig

I have a few issues with this. Admittedly i haven't been keeping up with the development version of DHARMa to not disrupt my old code. So our results might differ.

you are using -7, 7 as outlier values , please replace that with the result of this function:

calc_expected_outlier_val <- function(nSim){
  
  integrate_fun <- function(t, n){
    t*dnorm(t)*n*(pnorm(t)^(n-1))
  }
  # Expected maximum of nSim + 1 values
  expected_outlier <- integrate(integrate_fun, n = nSim+1, 
                                lower = -Inf, upper = Inf)
  return(expected_outlier$value)
}

For nSim = 250 the result is 2.82.

uniform and normal residuals should also be calculated as conditional residuals( so res2 not res). (It's also unnecessary to make the extra call in Option 3 as you already have res2) You'll see their power increase a lot and even their distribution under H0 becomes better. The latter however is a mirage, as the value get shrunken by the reduction in residual degrees of freedom from random coefficients. The underlying statistic is still too large, for what it should be. (It's also unnecessary to make the extra call in Option 3 as you already have res2)
You're plot function doesn't reset the par() after it is done and also doesn't work with the results from parallel computing.
Big thing i investigated yesterday but couldn't put into readable form:The current DHARMa method is also impcted by the residual degrees of freedom. In this simple case you get the shrinkage you'd expect, as in variance ratio of 0.75:

dat <- as.data.frame(replicate(500, rnorm(2000)))

fittedModel3 <- lm(V1 ~ ., data = dat)
summary(coef(fittedModel3))
res <- simulateResiduals(fittedModel3, plot = T)
testDispersion(res)

In mixed Poisson-models however there is a strange effect, where more random coefficients grow the test-statistic instead of shrinking it.

Here a small amount of random coefficients producing symmetrical, if too narrow* results around 1. *The narrowness comes from not properly considering the expected changes in variance in Poisson-models. Non mixed Poisson-models look the same.

require(lme4)
require(performance)

`%dopar%` <- foreach::`%dopar%`

my_simulate <- function(FUN, n_sim){
  cores <- parallel::detectCores() - 1
  message("parallel, set cores automatically to ", cores)
  cl <- parallel::makeCluster(cores)
  
  doParallel::registerDoParallel(cl)
  res <- foreach::foreach(i=1:n_sim, .packages=c("lme4", "DHARMa", "performance"), .combine = rbind) %dopar% 
    FUN()
  parallel::stopCluster(cl)
  return(res)
}
# test with 10 groups
test_fun2 <- function(){
  overdispersion = 0.0
  testData = createData(sampleSize = 5000, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                        overdispersion = overdispersion, numGroups = 10)
  
  fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)
  res <- simulateResiduals(fittedModel = fittedModel, re.form  = NULL, plot = F)
  x <- testDispersion(res, plot = F)
  c(x$statistic, x$p.value)
}

test_fun2()
test_res <- my_simulate(test_fun2, 1000)

hist(test_res[, 2], breaks = 20, probability = T)
abline(h = 1)
plot(test_res)
abline(h = 0.05)
hist(test_res[, 1], breaks = 20, probability = T)
curve(dnorm(x, mean = mean(test_res[, 1]), sd = sd(test_res[, 1])), add = T) # just something that interested me

Now if the number of random coefficients goes up so does the test statistic, even though it very much shouldn't

# test with 1000 groups
test_fun <-  function(){
  overdispersion = 0.0
  testData = createData(sampleSize = 5000, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                        overdispersion = overdispersion, numGroups = 1000)
  
  # fittedModel <- glm(observedResponse ~ Environment1, family = "poisson", data = testData)
  fittedModel <- glmer(observedResponse ~ Environment1 + (1|group), family = "poisson", data = testData)
  

  # Condition on random effects
  res <- simulateResiduals(fittedModel = fittedModel, re.form  = NULL, plot = F)
  x <- testDispersion(res, plot = F)
  c(x$statistic, x$p.value)
}


test_fun()
test_res <- my_simulate(test_fun, 1000)
colMeans(test_res)

hist(test_res[, 2], breaks = 20, probability = T)
abline(h = 1)
plot(test_res)
abline(h = 0.05)
hist(test_res[, 1], breaks = 20, probability = T)
curve(dnorm(x, mean = mean(test_res[, 1]), sd = sd(test_res[, 1])), add = T)

What do i think

I think almost all the tests that happen on the conditional residuals, as they should, don't differ much. They're all impacted by the shrinkage from fewer degrees of freedom and on the scale you're testing will become significant quite quickly. The current DHARMa test doesn't deal with the differences in theoretical variance, which produces some odd behavior and that's the main reason for changing it.

I'm currently thinking about using strongly CHi-Squared distributed test-statistic with a singular Refit of the model to obtain sensible bounds for the degrees of freedom. It is however in the early stages:

testData = createData(sampleSize = n, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                      overdispersion = 0, numGroups = 2500, intercept = 3)
fittedModel <- glmer(observedResponse ~ Environment1 + x + y + (1|group), family = "poisson", data = testData)


sum_of_squares <- function(fittedModel){
  lambdas <- predict(fittedModel, type = "response")
  observedResponse <- getObservedResponse(fittedModel)
  
  PIT_res <- sapply(1:n, function(i){
    maxVal <- ppois(observedResponse[i], lambda = lambdas[i])
    minVal <- ppois(observedResponse[i] - 0.5, lambda = lambdas[i])
    return(runif(1, minVal, maxVal))
  })
  
  
  z_res <- qnorm(PIT_res)
  sum(z_res^2)
}

sum_of_squares(fittedModel)

#simulated_df <- sum_of_squares(glmer(unlist(simulate(fittedModel)) ~ Environment1 + x + y + (1|group), family = "poisson", data = testData))
simulated_df <- sum_of_squares(getRefit(fittedModel, newresp = unlist(simulate(fittedModel))))
low_df <- qchisq(0.05, df= simulated_df)
high_df <- qchisq(0.95, df= simulated_df)
chi_sq <- sum_of_squares(fittedModel)
pchisq(chi_sq, df = low_df)
pchisq(chi_sq, df = high_df, lower.tail = F)
deviance_ratio <- chi_sq/simulated_df
deviance_ratio
x <- replicate(sum_of_squares(getRefit(fittedModel, newresp = unlist(simulate(fittedModel)))), n = 10)
mean(x)

I'm quite busy with family stuff for the next few days, so I'll maybe be back late on Friday with how that worked out.

Feb 17 '21 11:02 Istalan

Hi Lukas,

a) plots for parallel work for me. I think this was an old error that we corrected 2 weeks ago in the dev branch. par settings are now also changed. I have now removed again the extra function, and implemented S3 plot functions for the results of runBenchmarks (all in the overdispersion branch). We will still test this, but I think it works fine, and makes the code to discuss the overdispersion issues a bit less verbose.

b) About your correction of the outliers: OK, I will look into this. I suppose this would be something to implement in the residuals.DHARMa function actually, because it could have utility independent of this particular issue. I guess the overarching question is how to define simulation-based pearson residuals

c) OK, I take your point, when we mix conditional / unconditional, we are comparing apples with oranges. Results seem to suggest that conditional is always better, so we should probably compare everything with conditional simulations.

c) About the df / shrinkage / overfitting problem: I think there are really 2 problems that we have to distinguish: i) the df in the chi-sq test, and ii) that the RE might overfit to the data. I think problem 1 is unique to the chi-sq based tests, while ii) is a general problem of mixed models, that we discuss already in the Roberts Paper above, which is that predictions with mixed models overfit to a higher degree than most people assume, if you predict conditional on the fitted REs (see Roberts paper).

As the glmmWiki chi-sq test approximates the df for a RE with 1, i.e. to below, both effects go in the same direction (underdispersion), but I think those are really separate problems, which are

df.residuals() underestimates the df or a RE
RE models overfit if you predict conditional on the fitted REs

Feb 17 '21 12:02 florianhartig

Hi Florian,

I've thought a little about REs overfitting and i don't think it's "real". Random coefficients are still a type of residual, just transformed to the scale of the coefficients, i.e. linear. They can't overfit because they're not really part of the fit. Matter of factly with observation level random coefficients a GLMM can turn into a linear model:

n <- 1000
testData = createData(sampleSize = n, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                      overdispersion = 0, numGroups = 100,
                      # only works if there are no 0's in the observedResponse
                      intercept = 10) 
fittedModel <- glmer(observedResponse ~ Environment1 + x + y + (1|ID), family = "poisson", data = testData)
summary(fittedModel)

summary(lm(log(observedResponse) ~ Environment1 + x + y, data = testData))

I've also read the article(https://onlinelibrary.wiley.com/doi/full/10.1111/ecog.02881) and i would be lying if i said i understood everything, but my take away is that in the when a covariates and correlation structure of Models are correlated then too much of the correlations effect might be attributed to the covariates creating an overfit. Which i would interpret that some variance, that is RE, is attributed as a main effect. Again, i'm not sure. The article is long and i'm not familiar with a lot of the methods.

Anyway i've kept working under the assumption, that the conditional residuals having few degrees of freedom should not be an issue for the DHARMa dispersion test and i think my result is workable.

Estimating dispersion with normal residuals and one or few refits

I didn't do the thing with upper and lower bounds of the true df. Instead i simulate the distribution of the df given the simulated df of a few refits. One should be sufficient in most situations. The result is not exactly computationally cheap, but it should resolve in 1 or 2 seconds on most machines.

I made a small simplification for simulating the distribution, but it shouldn't impact anything for basic testing.

Here is the function to be run on a model. It would obviously need a lot of reworking to fit into DHARMa, but ifigured it be better to get this out there today instead of mulling over it on my own.

test_dispersion_z_sim_df <- function(fittedModel, n_sim_df = 1, n_sim_stat = 5000, plot = F){
  
  sum_of_squares <- function(fittedModel){
    # this should obviusly be the input in the package
    res <- simulateResiduals(fittedModel = fittedModel, re.form  = NULL, plot = F)
    z_res <- residuals(res, quantileFunction = qnorm, outlierValues = c(-1,1) * 2.820478)
    return(sum(z_res^2))
  }
  
  observed_chi_sq <- sum_of_squares(fittedModel)
  # mean of the result from n_sim_df refits
  simulated_df <- mean(replicate(n_sim_df, 
                                 sum_of_squares(getRefit(fittedModel, newresp = unlist(simulate(fittedModel))))
  ))
  
  # distribution of the simulated_df 
  ## this technically makes a small mistake simulating based of how propable somthing is based off ot being 
  ## produced by Chi^2(df = simulated_df) instead of by simulated_df being produced by it.
  ## it makes no real difference and would be much more complex. 
  df_for_sim <- rowMeans(replicate(n_sim_df, rchisq(n_sim_stat, df = simulated_df)))
  # distribution of the test statistic
  sim <- rchisq(n_sim_stat, df = df_for_sim)
  p_val <- DHARMa:::getP(simulated = sim, observed = observed_chi_sq, alternative = "two.sided", plot = plot)
  
  return(c(observed_chi_sq/simulated_df, p_val))
}

Here's some Code where i tested it. i've experimanted with different settings for n, number of groups, intercept and n_sim_df. I found it all fits a flat distribution of the p-values. Whith overdispersion = 0.3 n = 500 n_sim_df = 1 the probability for p < 0.05 was roughly 0.3.

I redefine the function the test_fun, because i haven't been able to load my own function into the namespace of the parallel processes.

# the usual my_simulate function
test_fun <-  function(){
  n <- 500
  testData = createData(sampleSize = n, fixedEffects = 1, family = poisson(), randomEffectVariance = 1, 
                        overdispersion = 0.3, numGroups = 100, intercept = 0)
  fittedModel <- glmer(observedResponse ~ Environment1 + x + y + (1|group), family = "poisson", data = testData)
  
  test_dispersion_z_sim_df <- function(fittedModel, n_sim_df = 1, n_sim_stat = 5000, plot = F){
    
    sum_of_squares <- function(fittedModel){
      # this should obviusly be the input in the package
      res <- simulateResiduals(fittedModel = fittedModel, re.form  = NULL, plot = F)
      z_res <- residuals(res, quantileFunction = qnorm, outlierValues = c(-1,1) * 2.820478)
      return(sum(z_res^2))
    }
    
    observed_chi_sq <- sum_of_squares(fittedModel)
    # mean of the result from n_sim_df refits
    simulated_df <- mean(replicate(n_sim_df, 
                                   sum_of_squares(getRefit(fittedModel, newresp = unlist(simulate(fittedModel))))
    ))
    
    # distribution of the simulated_df 
    ## this technically makes a small mistake simulating based of how propable somthing is based off ot being 
    ## produced by Chi^2(df = simulated_df) instead of by simulated_df being produced by it.
    ## it makes no real difference and would be much more complex. 
    df_for_sim <- rowMeans(replicate(n_sim_df, rchisq(n_sim_stat, df = simulated_df)))
    # distribution of the test statistic
    sim <- rchisq(n_sim_stat, df = df_for_sim)
    p_val <- DHARMa:::getP(simulated = sim, observed = observed_chi_sq, alternative = "two.sided", plot = plot)
    
    return(c(observed_chi_sq/simulated_df, p_val))
  }
  
  test_dispersion_z_sim_df(fittedModel, n_sim_df = 1)
}


test_fun()

n_sim <- 1000 
system.time(test_fun()) * n_sim / 60
test_res <- my_simulate(test_fun, n_sim)
colMeans(test_res)

lims_95_perc <- qbinom(p = c(0.025, 0.975), size = n_sim, prob = 0.05)/(n_sim * 0.05)
hist(test_res[, 2], breaks = 20, probability = T)
sum(test_res[, 2] < 0.05)/n_sim
abline(h = 1)
abline(h = lims_95_perc, lty = 2)

plot(test_res)
abline(h = 0.05)

hist(test_res[, 1], breaks = 20, probability = T)
curve(dnorm(x, mean = mean(test_res[, 1]), sd = sd(test_res[, 1])), add = T)

Feb 19 '21 17:02 Istalan

Hi Lukas, sorry for the silence, I'm dragged down by other commitments at the moment, will look at this as soon as I get some time free!

Mar 01 '21 13:03 florianhartig

DHARMa DHARMa copied to clipboard

Update dispersion test recommendations

The problems with the current approach

alternative approaches

What to implement?

What do i think

Estimating dispersion with normal residuals and one or few refits

DHARMa
DHARMa copied to clipboard