miceadds `mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions

`mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions

Open pehkawn opened this issue 1 year ago • 1 comments

I am currently trying to impute a three-level dataset with 87 columns and 71,756 rows. The variables comprise of which 4 identifier columns, 15 continuous outcome variables without missing entries, and 68 predictors and covariates with missing entries:

On level 1 (lowest, represents on individual) there are 16 ordinal and 20 dichotomous variables,
on level 2 there are 28 continuous variables, and
on level 3 (top) there are 4 ordinal variables.

I've been following Simon Grund's example for modeling three-level data using mice with the mice.impute.ml.lmer-function. Naturally, I had to make some adaptations to the example model to fit my data:

I tried setting model to "binary" to run a logistic mixed effects model for the dichotomous variables ("pmm" for the ordinal, "continuous" for the continuous).
I tried added random slopes and interaction effects.
mice.impute.2lonly.pmm was used instead of mice.impute.2lonly.norm for the top level imputation.
I added a post processing to a level 2 variable where I set upper and lower boundaries.

However when running mice (with some variables modeled as "binary" (without random slopes or interactions), I get the following warning:

Warning message in commonArgs(par, fn, control, environment()):
“maxfun < 10 * length(par)^2 is not recommended.”

Execution of mice hangs at this point.

I ran a test with mice (1 iteration), this time with all dichotomous variables as "pmm", and this time the function completed the run. However, adding variables to random_slopes it seemingly gets stuck (running infinitely) on the imputation of the first three variables. Now, my assumption is that this is due to the relatively large dataset, making the the process computationally very demanding.

I am wondering what exactly causes this error message, and if there are ways to avoid it. Also, I would like to know if there are ways to improve computational efficiency of such a large model.

I am not very familiar with mice, but I have some thoughts regarding how the data is imputed: I am planning to use the imputed data for a structural equation model I've built, where all the variables are grouped into indicators of latent constructs. It therefore seems natural that the indicator variables that belongs to the same construct are imputed together.

In mice there is an argument called blocks which allows for multivariate imputation of the variables grouped together as list elements. However, creating blocks containing variables from different levels created the issue that I got the error message that no top level was defined in the predictorMatrix (i.e. no block set to -2). As an alternative method, it seems the formulas argument can be used in place of a predictor matrix. This options seems ideal, as it allows user defined formulas for each block. Also, if I understand the whole process correctly, the predictorMatrix is only passed on to mice.impute.2lonly.pmm and not mice.impute.ml.lmer. The question then is if the formulas argument can be used to define three-level models using lme4-syntax? ..and can these user defined models in formulas be passed on to mice.impute.ml.lmer? As a more general question, why can't mice.impute.ml.lmer be used for imputation at top level? (At least, it didn't work when I tried.)
Then there's also an argument group_index in mice.impute.ml.lmer used to pass group identifiers to mice.impute.bygroup. From reading the documentation I am still unsure what this function actually does, as I can find little information on it. However, it seems it is designed for grouping variables together by level, but not across grouping of variables from different levels, correct? However, what would distinguish mice.impute.bygroup from creating blocks? ..and what would the difference of doing this, rather than calling models in mice.impute.ml.lmer?
As for computational efficiency, I have no idea if grouping variables together would increase computational efficiency. I could really use some advice on this part.

Jul 29 '22 00:07 pehkawn

miceadds miceadds copied to clipboard

`mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions

miceadds
miceadds copied to clipboard