callr icon indicating copy to clipboard operation
callr copied to clipboard

r_bg processes hang when waiting for others to spawn

Open angel-bee2018 opened this issue 7 months ago • 1 comments

Hi,

I'm using r_bg in a custom function to manage my parallel processing needs. As part of this, I have written a for loop that automatically spawns a set number of background tasks that process small chunks of a larger dataset depending on how many CPU cores I want to use.

The code itself is quite simple:

for (..i in 1:.num_workers) {
assign(
        x = paste("chunk_", ..i, sep = ""), 
        value = callr::r_bg(
          cmdargs = c(.temp_path, ..i),
          args = list("f" = function_current_instance),
          func = function_to_run
        )
      )

However, I am noticing a huge issue, especially with large datasets, that the individual child R processes take a long time to spawn - which is understandable but what happens is that once the child processes are all loaded up, they hang until the for loop in the parent process is finished.

I have attached a screenshot of this issue - the R process at the very bottom has freshly spawned, and takes around 20 seconds to fully load in. The bunch of R processes at the top have already finished loading and are ready to go, but are somehow paused.

It seems like there's some dependency on waiting for the parent R process to be free, and the pausing even happens mid-execution if one worker finishes and a new one is spawning. Is there any way to prevent this pausing of child processes?

I am on R 4.3.0, ubuntu 20.04 LTS, kernel 5.15.0-89

Many thanks in advance

image

angel-bee2018 avatar Nov 29 '23 17:11 angel-bee2018