future worker "pool" for nested paralellization

worker "pool" for nested paralellization

Open epruesse opened this issue 4 years ago • 2 comments

If I understand correctly, plan(tweak(multicore, workers=8)) means that the first nesting level gets 8 parallel threads and the second nesting level gets no parallelism. I could hard-allocate threads to each level, but that's hard to do since it means I have to know all thread usages down the tree of packages.

What I'm looking for is a "worker pool" like implementation. A naive greedy allocation using a semaphore that decrements every time a thread is forked off would be a good start. So that if I have a loop of three calling a package that has uses future.apply on a huge vector but takes very long to even get there, the NN workers can be busy for as much of the time as possible.

Interaction with in particular OMP is a problem of course. A lot of things seem to use that. IRC, Intel TBB auto-detects the number of "useful" threads to use and adjusts this value as it goes based on system load. Something like this would need extra house keeping, but the concept of "don't start more threads if all my workers/cpus are busy", or even "don't start more threads if we are at XY% memory" would be very useful to robustly run things in parallel.

Feb 21 '20 17:02 epruesse

If I understand correctly, plan(tweak(multicore, workers=8)) means that the first nesting level gets 8 parallel threads and the second nesting level gets no parallelism. I could hard-allocate threads to each level, but that's hard to do since it means I have to know all thread usages down the tree of packages.

Correct x 2.

What I'm looking for is a "worker pool" like implementation. So that if I have a loop of three calling a package that has uses future.apply on a huge vector but takes very long to even get there, the NN workers can be busy for as much of the time as possible.

I'm not sure I fully understand, but I can guess what you're after. Basically, if you do:

a <- future_lapply(x, function(y) {
   future_lapply(y, function(z)) {
      ...
   })
})

you want the inner and the outer "loops" to be able to pull from the same pool of "workers", correct?

This is available if you use an external job scheduler such as those available in HPC environment. Then you could use:

plan(list(outer = batchtools_slurm, inner = batchtools_slurm))

Both layers will submit their jobs (=futures) to the same job queue and it's up to the job scheduler to allocate resources as they get available.

Try to implement something similar in R is tedious but should be doable. Maybe one could build upon Gábor Csárdi's work in Multi Process Task Queue in 100 Lines of R Code, 2019-09-09. But, point is, this is not really something that should be implemented in the future package. Instead, it should/could be added asa new type of backend that futures can rely on - think:

library(future.taskqueue)
plan(list(outer=taskqueue, inner=taskqueue))
...

The future.tests package can be used to validate that it is properly implemented and meets the requirements of the future framework.

Mar 22 '20 23:03 HenrikBengtsson

Yes, that's what I meant. Though I was thinking less about nested loops in client code that are known to the user and easily configured with plan(list(...)), but about the levels hidden in library code. The docs tell package authors to stay away from plan, so I was initially assuming that there would be some kind of queue dealing with levels of nesting hidden from me.

That would be my main argument for allowing a simple queue scheduler into future - it's the simplest approach to arrive at "least surprising" behavior. A fully featured scheduler is clearly out of scope. The more packages use future themselves, though, the more complicated it becomes for the end user to set the right plan everywhere.

Another argument might be that future would be the place to place a call that can say something like "use up to 4 threads here". The knowledge what degree of parallelism is beneficial sits within the package (and preferably not in the vignette), and would ideally be hidden from consuming client code.

(I wish I could promise a PR, but it would be easier to promise that I'll never find the time...).

Mar 29 '20 21:03 epruesse

future future copied to clipboard

worker "pool" for nested paralellization

future
future copied to clipboard