batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

explicitly undo side effects of makeCluster

Open mtmorgan opened this issue 6 years ago • 2 comments

Some makeCluster* operations have side effects, e.g., opening connections

> nrow(showConnections())
[1] 0
> cl = makeClusterFunctionsSocket(2)
> nrow(showConnections())
[1] 2

There is no way to 'undo' (e.g., destroyCluster(cl)) these side-effects, and they are not destroyed by, e.g., removeRegistry(). I realize that there is a finalizer, so

> rm(cl)
> gc()
          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  542172 29.0     934164 49.9         NA   934164 49.9
Vcells 1473945 11.3    8388608 64.0      32768  8386115 64.0
> nrow(showConnections())
[1] 0

often works, but actually finalizers are not run in a deterministic order so that the this is not robust.

mtmorgan avatar Feb 09 '19 15:02 mtmorgan

Thanks for reporting. I guess I need to implement something like cf$startCluster() and cf$stopCluster() and call it internally in submitJobs(). This has the drawback that submitJobs() would have to wait for all jobs to finish, and thus asynchronicity is lost.

Just out of curiosity, where did this come up? Is this a problem while running R CMD check or for real world applications?

mllg avatar Feb 11 '19 20:02 mllg

It is related to #221 and to checks in https://github.com/BiocParallel, both of which have consequence in real-world applications (I think). The connections are still open because the finalizer hasn't run. When it does run, the order in which the finalizer runs is not deterministic (https://stat.ethz.ch/pipermail/r-devel/2011-July/061612.html; it's added to a linked list of SEXP; the order of elements in the linked list depends on what other objects are added to / removed from the linked list; periodically, the finalizer runs at a time when symbols referenced by the finalizer (e.g., the socket connection used in serialize() to send the "DONE" signal to the worker) have already been cleaned up; this signals an error).

For my use case I would be happy to be able to enforce synchronicity by calling stopCluster() directly (I don't think the 'user' has access to the Socket instance directly?)

mtmorgan avatar Feb 11 '19 20:02 mtmorgan