mice icon indicating copy to clipboard operation
mice copied to clipboard

parlmice from within a function

Open sam-crawley opened this issue 4 years ago • 6 comments

It seems some strange things happen with environments when parlmice is wrapped in a function, e.g.

library(mice)
test.parlmice <- function() {
  dat <- nhanes
  
  parlmice(dat, cl.type='FORK', maxit = 5, n.core = 2, n.imp.core = 2)
}

Calling test.parlmice() results in:

Error in get(name, envir = envir) : object 'dat' not found

The environment is still broken, even after the parlmice call, e.g.

test.parlmice2 <- function(someVal = T) {
  parlmice(nhanes, cl.type='FORK', maxit = 5, n.core = 2, n.imp.core = 2)
  
  print(someVal)
}

Result:

Error in get(name, envir = envir) : object 'someVal' not found

Session info:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_NZ.UTF-8       LC_NUMERIC=C               LC_TIME=en_NZ.UTF-8        LC_COLLATE=en_NZ.UTF-8     LC_MONETARY=en_NZ.UTF-8   
 [6] LC_MESSAGES=en_NZ.UTF-8    LC_PAPER=en_NZ.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mice_3.6.0      lattice_0.20-38

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        rstudioapi_0.10   magrittr_1.5      splines_3.6.1     MASS_7.3-51.4     tidyselect_0.2.5  R6_2.4.0          rlang_0.4.0      
 [9] jomo_2.6-9        minqa_1.2.4       dplyr_0.8.3       tools_3.6.1       parallel_3.6.1    nnet_7.3-12       grid_3.6.1        mitml_0.3-7      
[17] broom_0.5.2       nlme_3.1-141      pan_1.6           survival_2.44-1.1 yaml_2.2.0        lme4_1.1-21       assertthat_0.2.1  tibble_2.1.3     
[25] crayon_1.3.4      Matrix_1.2-17     nloptr_1.2.1      purrr_0.3.2       tidyr_0.8.3       rpart_4.1-15      glue_1.3.1        compiler_3.6.1   
[33] pillar_1.4.2      generics_0.0.2    backports_1.1.4   boot_1.3-23       pkgconfig_2.0.2  

sam-crawley avatar Aug 21 '19 03:08 sam-crawley

To solve this, I've had to modify the parlmice function:

envir <- environment()
  envir %>% appendEnv(parent_envir)
  
  cl <- parallel::makeCluster(n.core, type = cl.type)
  parallel::clusterExport(cl, 
                          varlist = names(ls(envir)),
                          envir = envir)
  parallel::clusterExport(cl, 
                          varlist = "do.call")
  parallel::clusterEvalQ(cl, library(mice))
  if (!is.na(cluster.seed)) {
    parallel::clusterSetRNGStream(cl, cluster.seed)
  }

  imps <- parallel::parLapply(cl = cl, X = 1:n.core, function(x) do.call(mice, as.list(args), envir = envir))

with appendEnv <- function(e1, e2) { listE1 <- ls(e1) listE2 <- ls(e2) for (v in listE2) { if (v %in% listE1) warning(sprintf("Variable %s is in e1, too!", v)) e1[[v]] <- e2[[v]] } }

and parent_envir my parent environment.

vwrobel avatar Oct 07 '19 13:10 vwrobel

I'll look into it. Thanks.

All the best,

Gerko

gerkovink avatar Oct 07 '19 13:10 gerkovink

I ran into this problem in 2021. Any advances in this direction?

Santos22903 avatar Feb 03 '21 19:02 Santos22903

I just ran into this problem as well.

awmercer avatar Jul 25 '21 19:07 awmercer

@vwrobel I too have run into this problem. I am trying to call parlmice from within another function.

Just to clarify, when you say that parent_envir is your parent environment, are you referring to parent_envir <- parent.frame()?

tim9800 avatar Oct 28 '21 06:10 tim9800

I ran in the same problem under R 4.1.2 on Windows 10 and Ubuntu 20.04.3 LTS today. I found a "hack" to get around the error reported by @sam-crawley. Therefore you can implement a wrapper function like this, which uses the same argument names as inside the parlmice() function. parlmice_wrapper() can then be applied in every scope of an R script.

parlmice_wrapper <- function(data, m, cluster.seed, n.core, n.imp.core) {
  result <- parlmice(data = data, m = m, cluster.seed = cluster.seed,
                 n.core = n.core, n.imp.core = n.imp.core, maxit = 5)
  return(result)
}

So, in parlmice all arguments are passed correctly to the cluster, since they have the same name as in the argument list when referring to the parent scope. The problematic part in the current source is

# make computing cluster
  cl <- parallel::makeCluster(n.core, type = cl.type)
  parallel::clusterExport(cl,
    varlist = c(
      "data", "m", "seed", "cluster.seed",
      "n.core", "n.imp.core", "cl.type",
      ls(parent.frame())
    ),
    envir = environment()
  )
  parallel::clusterExport(cl,
    varlist = "do.call"
  )
  parallel::clusterEvalQ(cl, library(mice))
  if (!is.na(cluster.seed)) {
    parallel::clusterSetRNGStream(cl, cluster.seed)
  }

  # generate imputations
  imps <- parallel::parLapply(cl = cl, X = 1:n.core, function(x) do.call(mice, as.list(args), envir = environment()))

as suggested by @vwrobel. I think his solution could do the trick, so that you don't need a hack like mine.

JoshuaSimon avatar Jan 05 '22 19:01 JoshuaSimon

We are going to retire parlmice() in favour of futuremice() available in mice 3.14.12.

Please reopen if this problem persists in futuremice().

stefvanbuuren avatar Nov 14 '22 15:11 stefvanbuuren