miceadds
miceadds copied to clipboard
`mice.impute.ml.lmer` on large three-level dataset: `"binary"` logistic model returns error, 'hangs' when adding random slopes or interactions
I am currently trying to impute a three-level dataset with 87 columns and 71,756 rows. The variables comprise of which 4 identifier columns, 15 continuous outcome variables without missing entries, and 68 predictors and covariates with missing entries:
- On level 1 (lowest, represents on individual) there are 16 ordinal and 20 dichotomous variables,
- on level 2 there are 28 continuous variables, and
- on level 3 (top) there are 4 ordinal variables.
I've been following Simon Grund's example for modeling three-level data using mice
with the mice.impute.ml.lmer
-function. Naturally, I had to make some adaptations to the example model to fit my data:
- I tried setting
model
to"binary"
to run a logistic mixed effects model for the dichotomous variables ("pmm"
for the ordinal,"continuous"
for the continuous). - I tried added random slopes and interaction effects.
-
mice.impute.2lonly.pmm
was used instead ofmice.impute.2lonly.norm
for the top level imputation. - I added a post processing to a level 2 variable where I set upper and lower boundaries.
However when running mice
(with some variables modeled as "binary" (without random slopes or interactions), I get the following warning:
Warning message in commonArgs(par, fn, control, environment()):
“maxfun < 10 * length(par)^2 is not recommended.”
Execution of mice
hangs at this point.
I ran a test with mice
(1 iteration), this time with all dichotomous variables as "pmm"
, and this time the function completed the run. However, adding variables to random_slopes
it seemingly gets stuck (running infinitely) on the imputation of the first three variables. Now, my assumption is that this is due to the relatively large dataset, making the the process computationally very demanding.
I am wondering what exactly causes this error message, and if there are ways to avoid it. Also, I would like to know if there are ways to improve computational efficiency of such a large model.
I am not very familiar with mice
, but I have some thoughts regarding how the data is imputed:
I am planning to use the imputed data for a structural equation model I've built, where all the variables are grouped into indicators of latent constructs. It therefore seems natural that the indicator variables that belongs to the same construct are imputed together.
- In
mice
there is an argument calledblocks
which allows for multivariate imputation of the variables grouped together as list elements. However, creating blocks containing variables from different levels created the issue that I got the error message that no top level was defined in thepredictorMatrix
(i.e. no block set to-2
). As an alternative method, it seems theformulas
argument can be used in place of a predictor matrix. This options seems ideal, as it allows user defined formulas for each block. Also, if I understand the whole process correctly, thepredictorMatrix
is only passed on tomice.impute.2lonly.pmm
and notmice.impute.ml.lmer
. The question then is if theformulas
argument can be used to define three-level models usinglme4
-syntax? ..and can these user defined models informulas
be passed on tomice.impute.ml.lmer
? As a more general question, why can'tmice.impute.ml.lmer
be used for imputation at top level? (At least, it didn't work when I tried.) - Then there's also an argument
group_index
inmice.impute.ml.lmer
used to pass group identifiers tomice.impute.bygroup
. From reading the documentation I am still unsure what this function actually does, as I can find little information on it. However, it seems it is designed for grouping variables together by level, but not across grouping of variables from different levels, correct? However, what would distinguishmice.impute.bygroup
from creating blocks? ..and what would the difference of doing this, rather than calling models inmice.impute.ml.lmer
? - As for computational efficiency, I have no idea if grouping variables together would increase computational efficiency. I could really use some advice on this part.