batchtools
batchtools copied to clipboard
multicore collects children inappropriately
This script
library(batchtools)
res <- NULL
registry <- makeRegistry(tempfile())
registry$cluster.functions <- makeClusterFunctionsMulticore(2); gc()
ids = batchMap(identity, 1:2, more.args = list(), reg = registry); gc()
ids$chunk = chunk(ids$job.id, 2); gc()
submitJobs(ids = ids, reg = registry); gc()
waitForJobs(ids = ids, reg = registry); gc()
res <- reduceResultsList(ids = ids, reg = registry); gc()
clearRegistry(reg=registry); gc()
generates warnings like
> waitForJobs(ids = ids, reg = registry); gc()
[1] TRUE
Warning messages:
1: In selectChildren(jobs, timeout) :
cannot wait for child 59480 as it does not exist
2: In selectChildren(jobs, timeout) :
cannot wait for child 59481 as it does not exist
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 655286 35.0 1399907 74.8 NA 1399907 74.8
Vcells 1349214 10.3 8388608 64.0 32768 3190967 24.4
I cannot reproduce this on Arch Linux with R version 3.5.2. Can you provide a sessionInfo()
?
Also, I'm not sure how to solve this. I see no way to collect the results from the forked processes expcept calling mccollect()
. I could suppress the warning, but this is more like a workaround that a solution.
> sessionInfo()
R Under development (unstable) (2019-02-08 r76071)
Platform: x86_64-apple-darwin17.7.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Users/ma38727/bin/R-devel/lib/libRblas.dylib
LAPACK: /Users/ma38727/bin/R-devel/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] batchtools_0.9.11 data.table_1.12.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 prettyunits_1.0.2 withr_2.1.2 digest_0.6.18
[5] crayon_1.3.4 assertthat_0.2.0 rappdirs_0.3.1 R6_2.3.0
[9] backports_1.1.3 magrittr_1.5 rlang_0.3.1 progress_1.2.0
[13] stringi_1.2.4 fs_1.2.6 brew_1.0-6 checkmate_1.9.1
[17] tools_3.6.0 hms_0.4.2 parallel_3.6.0 compiler_3.6.0
[21] pkgconfig_2.0.2 base64url_1.4
I don't think it's mccollect per se, but rather the finalizer on the R6 class running at the wrong time. I don't know enough about R6 classes to help further.
I had a similar issue like this in the past and, if I remember correctly, the warning listed PIDs different from what I was trying to collect.
If that is the case here, I think suppressing them and making sure the requested PIDs are cleaned up (e.g. using tools::pskill
) is the right approach. (Happy to be corrected by Martin or anyone else!)
This 1: In selectChildren(jobs, timeout) : cannot wait for child 59480 as it does not exist
is a show stopper for multicore batchtools on centos. Do you need more details?
[The multicore stuff was changed very often in the latest R releases, thus I'm not sure if there is a generic solution. I've made a small fix for R-3.6.x which should reduce the number of warnings (8d471e128f7bc51399da516fdf35bde7d02f34c1). Does this help?
Hopefully, @mllg's commit fixes this problem, but if not ...
All y'all, the warning on cannot wait for child NNNNN as it does not exist
was introduced in R 3.5.0. There were some bugs causing this warning to occur even if it should. It could be reproduced using the 'parallel' package alone. That particular problem was fixed in R 3.5.2.
For those who report seeing this warning, please make sure to share (a) what version of R you are using, and (b) what operating system you are on. Sharing you sessionInfo()
covers both of this and more. If you're using R (>= 3.5.0 & < 3.5.2), then that's why you get the warning.
If it turns out that there is still a bug in R itself, it would be awesome to narrow this down so that it can be resolved there.
My $.02