mice icon indicating copy to clipboard operation
mice copied to clipboard

Conditional PMM routine that excludes (a vector of) observed values from the donor pool

Open gerkovink opened this issue 3 years ago • 1 comments

Might be interesting to include since it comes up as a request quite often.

What

mice.impute.pmm.exclude excludes observed values or a vector of observed values from matching. Hence, these values are not imputed, but still have a role in imputation.

Why

Sometimes users want to exclude certain observations from ending up in the imputations, without excluding them from the imputation procedure altogether. With mice.impute.pmm.exclude these observed values can still serve as predictor values.

Some tests

# to install this
# devtools::install_github(repo = "gerkovink/mice@pmm999")
library(mice)

# TEST 1
# impute without exclude
imp <- mice(nhanes, 
            seed = 123, 
            printFlag = FALSE)
A <- imp$imp$chl

# impute with exclude
meth  <- make.method(nhanes)
meth["chl"] <- "pmm.exclude"
imp <- mice(nhanes, meth = meth, exclude = c(218, 187), 
            seed = 123, 
            printFlag = FALSE)
B <- imp$imp$chl

any(A == 187 | A == 218) # May be TRUE
#> [1] TRUE 
any(B == 187 | B == 218) # Must be FALSE
#> [1] FALSE 

# TEST 2 - copied from mice.impute.pmm
set.seed(53177)
xname <- c("age", "hgt", "wgt")
r <- stats::complete.cases(boys[, xname])
x <- boys[r, xname]
y <- boys[r, "tv"]
ry <- !is.na(y)

# Impute missing tv data with original pmm
set.seed(123); yimp.pmm <- mice.impute.pmm(y, ry, x)
set.seed(123); yimp <- mice.impute.pmm.exclude(y, ry, x)
identical(yimp, yimp.pmm) #should be TRUE
#> [1] TRUE

set.seed(123); yimp.pmm <- mice.impute.pmm(y, ry, x)
set.seed(123); yimp <- mice.impute.pmm.exclude(y, ry, x, exclude = c(20, 25))
identical(yimp, yimp.pmm) # should be FALSE
#> [1] FALSE
c(20, 25) %in% yimp # should be FALSE twice
#> [1] FALSE FALSE

Created on 2021-05-10 by the reprex package (v1.0.0)

R CMD check

── R CMD check results ────────────────────────────────── mice 3.13.7 ────
Duration: 3m 4.3s

0 errors ✓ | 0 warnings ✓ | 0 notes ✓

R CMD check succeeded

gerkovink avatar May 10 '21 10:05 gerkovink

This is a useful addition. Two suggestions:

  • Preferably implemented in the standard mice.impute.pmm(..., exclude = c(...))function to evade code duplication;
  • In the likely case that you want different exclusions for different variables, use the blots parameter to pass down different exclude vectors.

stefvanbuuren avatar Sep 23 '21 20:09 stefvanbuuren

moving over to other branch. Closed by #519.

gerkovink avatar Nov 10 '22 14:11 gerkovink