ggmice
ggmice copied to clipboard
Plot variance of predicted values after imputation
gerko's new var plot idea
@KyuriP Thank you for your contribution! As discussed, I've implemented your code into the existing ggmice
funtion plot_variance()
. To maintain flexibility wrt different analyses in mira
objects, the observed data is not plotted. Instead, the row number is plotted on the y axis, just as the mids
objects when visualized with this function:
library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
plot_variance(imp)
Created on 2023-03-27 with reprex v2.0.2
library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
fit <- mice:::with.mids(imp, lm(bmi~age))
plot_variance(fit)
Created on 2023-03-27 with reprex v2.0.2
Adjustments/suggestions are welcome! (cc @gerkovink )
use broom
to extract residuals and plot the predicted values against the observed data (and average imputed data) instead
Thanks a bunch!! I do have two questions still:
- why is the scale of the variability categorical, and not continuous?
- are the warnings expected behaviour?
library(ggmice)
library(mice)
#>
#> Attaching package: 'mice'
#> The following objects are masked from 'package:ggmice':
#>
#> bwplot, densityplot, stripplot, xyplot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
mira <- with(mice(nhanes, print = FALSE), lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).
Created on 2023-04-11 with reprex v2.0.2
One issue. This function does not allow for the mild workflow (e.g. purrr:map()
) advocated here.
library(mice, warn.conflicts = FALSE)
library(ggmice, warn.conflicts = FALSE)
library(magrittr)
library(purrr)
# mild workflow with purrr:map()
mild_mira <-
nhanes %>%
mice(print = FALSE) %>%
complete("all") %>%
map(~.x %$% lm(bmi~chl))
plot_variance(mild_mira)
#> Error in plot_variance(mild_mira): Input is not a Multiply Imputed Data Set of class `mids`/ `mira`.
#>
#> Perhaps function mice::as.mids() can be of use?
This error message is slightly informative, but not sufficiently as it should point towards with_mids()
. On the other hand, we also advocate the mapped workflow in mice
, so we should allow for the use of that workflow in ggmice
.
The with workflow works without fail:
# regular workflow
mira <- with(mice(nhanes,
print = FALSE),
lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).
Now the interesting thing is that both mild_mira
and mira
have the exact same list structure, minus the call and class info. mice::pool()
does not care about this difference, so perhaps we can borrow a solution from that functionality:
# pooling
pool(mira)
#> Class: mipo m = 5
#> term m estimate ubar b t dfcom
#> 1 (Intercept) 5 20.35823098 1.530174e+01 5.1736784106 2.151015e+01 23
#> 2 chl 5 0.03256681 4.122791e-04 0.0001760493 6.235383e-04 23
#> df riv lambda fmi
#> 1 11.48917 0.4057326 0.2886272 0.3868209
#> 2 10.00654 0.5124178 0.3388070 0.4404779
pool(mild_mira)
#> Class: mipo m = 5
#> term m estimate ubar b t dfcom
#> 1 (Intercept) 5 22.77324050 1.472193e+01 4.800490e-01 1.529799e+01 23
#> 2 chl 5 0.01958118 3.950836e-04 2.448063e-05 4.244603e-04 23
#> df riv lambda fmi
#> 1 20.28439 0.03912930 0.03765585 0.1203159
#> 2 19.30457 0.07435579 0.06920965 0.1526715
Created on 2023-04-12 with reprex v2.0.2
Nice work, @KyuriP! One more thing: could you maybe change the discrete scale for the variance to a continuous one to match the plot_variance()
output for dataframes?
Okay, now there are just 1 error message in the example and 1 NOTE remaining :)
checking R code for possible problems ... NOTE
plot_variance: no visible binding for global variable '.'
plot_variance: no visible binding for global variable 'm'
plot_variance: no visible binding for global variable '.fitted'
plot_variance: no visible binding for global variable '.resid'
plot_variance: no visible binding for global variable 'avg'
plot_variance: no visible binding for global variable 'observed'
plot_variance: no visible binding for global variable 'vrn'
Undefined global functions or variables:
. .fitted .resid avg m observed vrn
0 errors ✔ | 0 warnings ✔ | 1 note ✖
Optional: add a 'perfect prediction' line with geom_abline(intercept = 0, slope = 1)
?
Another addition: the vrb
argument that all other ggmice
functions have!