ggmice icon indicating copy to clipboard operation
ggmice copied to clipboard

Plot variance of predicted values after imputation

Open KyuriP opened this issue 1 year ago • 9 comments

gerko's new var plot idea

KyuriP avatar Mar 23 '23 16:03 KyuriP

@KyuriP Thank you for your contribution! As discussed, I've implemented your code into the existing ggmice funtion plot_variance(). To maintain flexibility wrt different analyses in mira objects, the observed data is not plotted. Instead, the row number is plotted on the y axis, just as the mids objects when visualized with this function:

library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
plot_variance(imp)

Created on 2023-03-27 with reprex v2.0.2

library(ggmice)
imp <- mice::mice(mice::nhanes, printFlag = FALSE)
fit <- mice:::with.mids(imp, lm(bmi~age))
plot_variance(fit)

Created on 2023-03-27 with reprex v2.0.2

Adjustments/suggestions are welcome! (cc @gerkovink )

hanneoberman avatar Mar 27 '23 16:03 hanneoberman

use broom to extract residuals and plot the predicted values against the observed data (and average imputed data) instead

hanneoberman avatar Mar 29 '23 11:03 hanneoberman

Thanks a bunch!! I do have two questions still:

  • why is the scale of the variability categorical, and not continuous?
  • are the warnings expected behaviour?
library(ggmice)
library(mice)
#> 
#> Attaching package: 'mice'
#> The following objects are masked from 'package:ggmice':
#> 
#>     bwplot, densityplot, stripplot, xyplot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
mira <- with(mice(nhanes, print = FALSE), lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).

Created on 2023-04-11 with reprex v2.0.2

hanneoberman avatar Apr 11 '23 07:04 hanneoberman

One issue. This function does not allow for the mild workflow (e.g. purrr:map()) advocated here.

library(mice, warn.conflicts = FALSE)
library(ggmice, warn.conflicts = FALSE)
library(magrittr)
library(purrr)

# mild workflow with purrr:map()
mild_mira <- 
  nhanes %>% 
  mice(print = FALSE) %>% 
  complete("all") %>% 
  map(~.x %$% lm(bmi~chl))
plot_variance(mild_mira)
#> Error in plot_variance(mild_mira): Input is not a Multiply Imputed Data Set of class `mids`/ `mira`. 
#> 
#>          Perhaps function mice::as.mids() can be of use?

This error message is slightly informative, but not sufficiently as it should point towards with_mids(). On the other hand, we also advocate the mapped workflow in mice, so we should allow for the use of that workflow in ggmice.

The with workflow works without fail:

# regular workflow
mira <- with(mice(nhanes, 
                  print = FALSE), 
             lm(bmi~chl))
plot_variance(mira)
#> Warning: Removed 9 rows containing missing values (`geom_point()`).

Now the interesting thing is that both mild_mira and mira have the exact same list structure, minus the call and class info. mice::pool() does not care about this difference, so perhaps we can borrow a solution from that functionality:


# pooling
pool(mira)
#> Class: mipo    m = 5 
#>          term m    estimate         ubar            b            t dfcom
#> 1 (Intercept) 5 20.35823098 1.530174e+01 5.1736784106 2.151015e+01    23
#> 2         chl 5  0.03256681 4.122791e-04 0.0001760493 6.235383e-04    23
#>         df       riv    lambda       fmi
#> 1 11.48917 0.4057326 0.2886272 0.3868209
#> 2 10.00654 0.5124178 0.3388070 0.4404779
pool(mild_mira)
#> Class: mipo    m = 5 
#>          term m    estimate         ubar            b            t dfcom
#> 1 (Intercept) 5 22.77324050 1.472193e+01 4.800490e-01 1.529799e+01    23
#> 2         chl 5  0.01958118 3.950836e-04 2.448063e-05 4.244603e-04    23
#>         df        riv     lambda       fmi
#> 1 20.28439 0.03912930 0.03765585 0.1203159
#> 2 19.30457 0.07435579 0.06920965 0.1526715

Created on 2023-04-12 with reprex v2.0.2

gerkovink avatar Apr 12 '23 09:04 gerkovink

Nice work, @KyuriP! One more thing: could you maybe change the discrete scale for the variance to a continuous one to match the plot_variance() output for dataframes?

hanneoberman avatar Apr 13 '23 07:04 hanneoberman

Okay, now there are just 1 error message in the example and 1 NOTE remaining :)

checking R code for possible problems ... NOTE
  plot_variance: no visible binding for global variable '.'
  plot_variance: no visible binding for global variable 'm'
  plot_variance: no visible binding for global variable '.fitted'
  plot_variance: no visible binding for global variable '.resid'
  plot_variance: no visible binding for global variable 'avg'
  plot_variance: no visible binding for global variable 'observed'
  plot_variance: no visible binding for global variable 'vrn'
  Undefined global functions or variables:
    . .fitted .resid avg m observed vrn

0 errors ✔ | 0 warnings ✔ | 1 note ✖

hanneoberman avatar Apr 13 '23 15:04 hanneoberman

Optional: add a 'perfect prediction' line with geom_abline(intercept = 0, slope = 1)?

hanneoberman avatar Apr 13 '23 15:04 hanneoberman

Another addition: the vrb argument that all other ggmice functions have!

hanneoberman avatar Jun 07 '23 15:06 hanneoberman