callr
callr copied to clipboard
r_bg processes hang when waiting for others to spawn
Hi,
I'm using r_bg in a custom function to manage my parallel processing needs. As part of this, I have written a for
loop that automatically spawns a set number of background tasks that process small chunks of a larger dataset depending on how many CPU cores I want to use.
The code itself is quite simple:
for (..i in 1:.num_workers) {
assign(
x = paste("chunk_", ..i, sep = ""),
value = callr::r_bg(
cmdargs = c(.temp_path, ..i),
args = list("f" = function_current_instance),
func = function_to_run
)
)
However, I am noticing a huge issue, especially with large datasets, that the individual child R processes take a long time to spawn - which is understandable but what happens is that once the child processes are all loaded up, they hang until the for
loop in the parent process is finished.
I have attached a screenshot of this issue - the R process at the very bottom has freshly spawned, and takes around 20 seconds to fully load in. The bunch of R processes at the top have already finished loading and are ready to go, but are somehow paused.
It seems like there's some dependency on waiting for the parent R process to be free, and the pausing even happens mid-execution if one worker finishes and a new one is spawning. Is there any way to prevent this pausing of child processes?
I am on R 4.3.0, ubuntu 20.04 LTS, kernel 5.15.0-89
Many thanks in advance