batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

multicore collects children inappropriately

Open mtmorgan opened this issue 5 years ago • 6 comments

This script

library(batchtools)
res <- NULL
registry <- makeRegistry(tempfile())
registry$cluster.functions <- makeClusterFunctionsMulticore(2); gc()
ids = batchMap(identity, 1:2, more.args = list(), reg = registry); gc()
ids$chunk = chunk(ids$job.id, 2); gc()
submitJobs(ids = ids, reg = registry); gc()
waitForJobs(ids = ids, reg = registry); gc()
res <- reduceResultsList(ids = ids, reg = registry); gc()
clearRegistry(reg=registry); gc()

generates warnings like

> waitForJobs(ids = ids, reg = registry); gc()

[1] TRUE
Warning messages:
1: In selectChildren(jobs, timeout) :
  cannot wait for child 59480 as it does not exist
2: In selectChildren(jobs, timeout) :
  cannot wait for child 59481 as it does not exist
          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  655286 35.0    1399907 74.8         NA  1399907 74.8
Vcells 1349214 10.3    8388608 64.0      32768  3190967 24.4

mtmorgan avatar Feb 11 '19 12:02 mtmorgan

I cannot reproduce this on Arch Linux with R version 3.5.2. Can you provide a sessionInfo()?

Also, I'm not sure how to solve this. I see no way to collect the results from the forked processes expcept calling mccollect(). I could suppress the warning, but this is more like a workaround that a solution.

mllg avatar Feb 11 '19 20:02 mllg

> sessionInfo()
R Under development (unstable) (2019-02-08 r76071)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] batchtools_0.9.11 data.table_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        prettyunits_1.0.2 withr_2.1.2       digest_0.6.18
 [5] crayon_1.3.4      assertthat_0.2.0  rappdirs_0.3.1    R6_2.3.0
 [9] backports_1.1.3   magrittr_1.5      rlang_0.3.1       progress_1.2.0
[13] stringi_1.2.4     fs_1.2.6          brew_1.0-6        checkmate_1.9.1
[17] tools_3.6.0       hms_0.4.2         parallel_3.6.0    compiler_3.6.0
[21] pkgconfig_2.0.2   base64url_1.4

I don't think it's mccollect per se, but rather the finalizer on the R6 class running at the wrong time. I don't know enough about R6 classes to help further.

mtmorgan avatar Feb 11 '19 20:02 mtmorgan

I had a similar issue like this in the past and, if I remember correctly, the warning listed PIDs different from what I was trying to collect.

If that is the case here, I think suppressing them and making sure the requested PIDs are cleaned up (e.g. using tools::pskill) is the right approach. (Happy to be corrected by Martin or anyone else!)

mschubert avatar Feb 18 '19 12:02 mschubert

This 1: In selectChildren(jobs, timeout) : cannot wait for child 59480 as it does not exist is a show stopper for multicore batchtools on centos. Do you need more details?

vjcitn avatar Nov 15 '19 21:11 vjcitn

[The multicore stuff was changed very often in the latest R releases, thus I'm not sure if there is a generic solution. I've made a small fix for R-3.6.x which should reduce the number of warnings (8d471e128f7bc51399da516fdf35bde7d02f34c1). Does this help?

mllg avatar Nov 19 '19 15:11 mllg

Hopefully, @mllg's commit fixes this problem, but if not ...

All y'all, the warning on cannot wait for child NNNNN as it does not exist was introduced in R 3.5.0. There were some bugs causing this warning to occur even if it should. It could be reproduced using the 'parallel' package alone. That particular problem was fixed in R 3.5.2.

For those who report seeing this warning, please make sure to share (a) what version of R you are using, and (b) what operating system you are on. Sharing you sessionInfo() covers both of this and more. If you're using R (>= 3.5.0 & < 3.5.2), then that's why you get the warning.

If it turns out that there is still a bug in R itself, it would be awesome to narrow this down so that it can be resolved there.

My $.02

HenrikBengtsson avatar Nov 19 '19 16:11 HenrikBengtsson