DHARMa Missing At Random missing data

Missing At Random missing data

Open drizopoulos opened this issue 5 years ago • 4 comments

In the case of mixed models often we have to deal with incomplete data, for example, in longitudinal studies, subjects often drop out from the study. In this context, the framework of mixed models provides correct inferences under both the missing completely at random (missingness does not depend on the outcome) and missing at random (missingness depends on the observed outcomes) mechanisms.

The goodness-of-fit method implemented in DHARMa works in the missing completely at random case, but it does not seem to work under the missing at random one. A script illustrating the issue is provided below.

Making the connection to the posterior predictive tests, I think that a potential solution could be to impute the missing data from the model and apply procedure in the combined imputed and observed data.

library("lme4")
#> Loading required package: Matrix
library("DHARMa")
#> Note that, since v0.1.6.2, DHARMa includes support for glmmTMB, but there are still a few minor limitations associatd with this package. Please see https://github.com/florianhartig/DHARMa/issues/16 for details, in particular if you use this for production.

simulate_data <- function (seed) {
    set.seed(seed)
    n <- 200 # number of subjects
    K <- 8 # number of measurements per subject
    t_max <- 5 # maximum follow-up time
    
    # we constuct a data frame with the design: 
    # everyone has a baseline measurment, and then measurements at random follow-up times
    DF <- data.frame(id = rep(seq_len(n), each = K),
                     time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                     sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
    
    # design matrices for the fixed and random effects
    X <- model.matrix(~ sex * time, data = DF)
    Z <- model.matrix(~ time, data = DF)
    
    betas <- c(12, -1.2, 1.6, -0.9) # fixed effects coefficients
    sigma <- 0.5 # standard deviation error terms
    D11 <- 0.5 # variance of random intercepts
    D22 <- 0.1 # variance of random slopes
    
    # we simulate random effects
    b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
    # linear predictor
    eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
    # we simulate normal longitudinal data
    DF$y <- rnorm(n * K, mean = eta_y, sd = sigma)
    
    # missing completely at random missing data
    DF$y2 <- DF$y # copy of y
    set_to_na_mcar <- function (x) {
        ind <- seq(2, length(x))
        how_many_nas <- sample(K-2, 1)
        x[sample(ind, how_many_nas)] <- NA
        x
    }
    DF$y2 <- with(DF, ave(y2, id, FUN = set_to_na_mcar))
    
    # missing at random missing data
    DF$y3 <- DF$y # copy of y
    # if any y was above 13 except the baseline measurement, the subject drops out
    set_to_na_mar <- function (x) {
        nx <- length(x)
        ind <- which(x[-1] > 13)[1] + 2
        if (!is.na(ind) && ind < nx)
            x[seq(ind, nx)] <- NA
        x
    }
    DF$y3 <- with(DF, ave(y3, id, FUN = set_to_na_mar))
    
    DF
}

##########################################################################################
##########################################################################################

M <- 100
betas1 <- betas2 <- betas3 <- matrix(0.0, M, 4)
p_values1 <- p_values2 <- p_values3 <- numeric(M)
for (m in seq_len(M)) {
    DF <- simulate_data(2018 + m)
    ##
    # fit the model and residuals in complete data
    fm1 <- lmer(y ~ sex * time + (time | id), data = DF)
    betas1[m, ] <- fixef(fm1)
    res1 <- simulateResiduals(fittedModel = fm1, n = 500)
    p_values1[m] <- testUniformity(simulationOutput = res1, plot = FALSE)$p.value
    ###
    # fit the model and residuals in missing completely at random
    fm2 <- lmer(y2 ~ sex * time + (time | id), data = DF)
    betas2[m, ] <- fixef(fm2)
    res2 <- simulateResiduals(fittedModel = fm2, n = 500)
    p_values2[m] <- testUniformity(simulationOutput = res2, plot = FALSE)$p.value
    ###
    # fit the model and residuals in missing at random
    fm3 <- lmer(y3 ~ sex * time + (time | id), data = DF)
    betas3[m, ] <- fixef(fm3)
    res3 <- simulateResiduals(fittedModel = fm3, n = 500)
    p_values3[m] <- testUniformity(simulationOutput = res3, plot = FALSE)$p.value
}

# Bias
true_betas <- c(12, -1.2, 1.6, -0.9)
colMeans(betas1 - rep(true_betas, each = M))
#> [1]  0.013800880 -0.010452142 -0.001517219 -0.002801698
colMeans(betas2 - rep(true_betas, each = M))
#> [1]  0.0098002693 -0.0056694382  0.0002078754 -0.0050659936
colMeans(betas3 - rep(true_betas, each = M))
#> [1]  0.011886233 -0.011053783  0.005843979 -0.007349007

# p-value for Goodness-of-Fit
summary(p_values1)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.06809 0.36906 0.63980 0.58937 0.78810 0.98741
summary(p_values2)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.01241 0.43845 0.62246 0.60387 0.81227 0.99493
summary(p_values3)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.000e+00 3.800e-09 5.491e-08 6.299e-06 1.684e-06 1.888e-04

Dec 19 '18 07:12 drizopoulos

Hi Dimitris,

thank you for this comprehensive example, this is interesting. For people (like me) that are less versed in medical stats terminology, here the definition of MAR in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688419/

Missing at random (MAR)—Data following dropout are MAR if the future statistical behaviour of a participant’s outcomes, given his or her past outcomes and covariates, would have been the same whether he/she had dropped out or not. For example, missing post-baseline quality of life data may occur in patients with low baseline quality of life. However, after this information has been taken into account (using baseline quality of life as a covariate in a regression model), no systematic differences exist between dropouts and completers.

So, what you are doing here is a kind of repeated measures ANOVA, where people drop out when their observed value exceeds a certain threshold.

About your question: the significant p-values go away when you tell DHARMa to work with conditional simulations, i.e. not re-simulate the random effects

res3 <- simulateResiduals(fittedModel = fm3, n = 500, re.form = ~ 0)
p_values3[m] <- testUniformity(simulationOutput = res3, plot = FALSE)$p.value

My interpretation of this is that the MAR does actually create a bias on the mixed model, but only on the random effects. What likely happens is that subjects with a stronger time effect get censored, so they have less data, and thus possibly get a stronger regularisation bias from the RE. Thus, if you check if the data corresponds to the fitted model

DHARMa tests are positive if you test the entire model structure
Are negative if you only test the model conditional on the fitted REs

It would be interesting to actually confirm this by looking at the REs (you would have to check individual REs though, I don't think this will show up in the variances).

I'm not sure what general lessons to draw from the example.

On the one hand, this is slightly misleading to the user, because DHARMa says there is a problem whereas the results would be useable
On the other hand, I think the package is doing exactly what it's supposed to do, and I am actually quite pleased that this confirms my firm opinion that the default should be to re-simulate the entire RE structure (unlike suggested in many textbooks, which only simulate the top structure, corresponding to re.form ~ 0).

I think the bottomline for me is this: when simulating the entire RE structure, DHARMa correctly highlights that the data was created by a different process than assumed by the model. In this case, however, this doesn't create a bias on the fixed effects. More generally, there are probably many cases where it is legitimate to use a model, or parts of the results of a model, even when this model doesn't fit to the data. However, for an isolated dataset, DHARMa can not decide whether this is the case. It's basically up to the user to understand what assumptions have to be met (and what part of the residuals have to be checked) if they use "partially wrong" models for the analysis.

Does this make any sense to you? If you have any ideas how to better report these issues to the user, they would be appreciated. Unfortunately, I cannot think of any general rules that would allow me to diagnose if a particular issue would only bias REs or similar. It's probably not generally true that if the conditional residuals look good, fixed effects can be trusted.

Dec 20 '18 09:12 florianhartig

Hi Florian,

Thanks for your detailed response. Indeed, this issue with Missing At Random missing data is that the sample you end to have is not a random sample from the target population, it is a selected sample. Nonetheless, a likelihood-based inference method still provides correct results.

On the one hand, I see and accept your point that DHARMa recognizes that the sampling mechanism that gave rise to the observed data is not the same as the sampling mechanism implied by the fitted model. On the other hand, as you also mentioned, it can be misleading for the users if DHARMa says that there is a problem with the model even though there is none. That is, in situations with missing data (which practically speaking it is almost always the case), we won't be able to know whether the model is wrong or the misfit is only attributed to missing data.

To my knowledge, this topic has received some attention in the closely-related method of posterior predictive checks; for example, check here. Though, I don't know how easy it would be to implement something like this in DHARMa.

Dec 21 '18 19:12 drizopoulos

Hi Dimitris,

I guess I could implement something like this, but the question is how DHARMa should know when to apply this procedure. Moreover, there is the MAR case, but there are probably many similar cases in other fields.

But even if we take only the MAR: how should DHARMa actually know that data is MAR? From what I understand, it's not trivial to check this. It seems to me that if a user has sufficient statistical proficiency to be sure that data is MAR, they should also be proficient enough to understand how to extract residuals in DHARMa and do appropriate corrections.

What I could possibly do is to create a section in the vignette about "apparent" residual problems, and mentioning the MAR as one case (I can think about a few other situations where residuals are easily misinterpreted, both regarding false positives and false negatives)

Dec 31 '18 13:12 florianhartig

Added a comment about this in the vignette in 0.2.1, will add a bit more in 0.2.3. Will close this for now.

Jan 20 '19 16:01 florianhartig

DHARMa DHARMa copied to clipboard

Missing At Random missing data

DHARMa
DHARMa copied to clipboard