callr
callr copied to clipboard
Signal interrupts are not propagated to worker nodes in background sessions with `parallel` clusters
Suppose you have the following background session object with a parallel cluster object.
# Create a new `R` background session.
session <- callr::r_session$new()
# Create a cluster with two worker processes and record the process IDs.
worker_pids <- session$run(function() {
# Create a cluster.
cluster <<- parallel::makeCluster(spec = 2, type = "PSOCK")
# Return the worker process IDs.
return(unlist(parallel::clusterCall(cluster, Sys.getpid)))
})
# Run a task in the background that keeps the cluster busy indefinitely.
session$call(function() {
# Send the task to the cluster workers.
parallel::parSapply(cluster, 1:10, function(x) {
# An infinite loop.
while (TRUE) { Sys.sleep(0.0001) }
})
})
# Interrupt the session.
session$interrupt()
# Read the session interruption condition.
session$read()
# Confirm the session is `idle`.
session$get_state()
The following will hang indefinitely on macOS because the workers are still busy (i.e., I believe the SIGINT signal is not propagated to the workers). However, it will output as expected on Windows.
# Send the run and output the results.
session$run(function() {
parallel::clusterCall(cluster, print, "Subsequent cluster call.")
})
At this point I sent a manual interrupt (i.e., Ctrl + C) on macOS.
# Close the session.
session$close()
# Remove the reference to the `session` object.
rm(session)
# Request garbage collection.
gc(full = TRUE)
On macOS the worker processes are still running.
# Check the status of the worker processes.
lapply(worker_pids, function(pid) {
ps::ps_is_running(
ps::ps_handle(pid)
)
})
# Clean.
tools::pskill(worker_pids, tools::SIGTERM)
Is this expected behavior? Am I missing something obvious?
I believe the SIGINT signal is not propagated to the workers
IDK if it is a good idea to propagate signals from the call subprocess recursively. Even if it is a good idea, I am not sure if it is possible to do that at all. I think what is happening in Windows, possibly, is that the parallel subprocesses share the same console with the callr subprocess, so they are also interrupted. But that's not the case on Unix, apparently. Probably the callr subprocess has no controlling terminal at all.
In any case, callr will not manage the subprocesses you start with parallel, you'll have to take care of that.