Distributed.jl icon indicating copy to clipboard operation
Distributed.jl copied to clipboard

Nested PMAP calls do not work

Open abx78 opened this issue 6 years ago • 4 comments

Hello, the following code with has nested pmap functions, it hangs even if I execute it with 1 single process addprocs(1).

using Distributed
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

This is the version I am using

julia> versioninfo() Julia Version 1.1.0 Commit 80516ca202 (2019-01-21 21:24 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Environment: export JULIA_NUM_THREADS = 4 JULIA_DEPOT_PATH = C:\Program Files\ReSolver.DistributedJulia\depot JULIA_EDITOR = C:\Users\AppData\Local\atom\atom.exe JULIA_NUM_THREADS = 4 JULIA_PKGDIR = C:\Program Files\ReSolver.DistributedJulia\packages

Am I maybe misunderstanding the feature or is this an expected behavior?

abx78 avatar Jun 11 '19 13:06 abx78

Note that this code executes correctly if no worker processes are added, i.e. one does not run addprocs and just uses the master process.

jpsamaroo avatar Jun 11 '19 14:06 jpsamaroo

I guess this is expected behavior, since the outer pmap already employs (by default) all workers, so there are no ones left to distribute to. Check the ?pmap docstring. However, when you specify which workers to use, everything works fine even with nested pmaps:

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool([4, 5]), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool([2, 3]), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

returns the same result as running your original code on a single worker process.

dkarrasch avatar Jun 12 '19 06:06 dkarrasch

This example also works for me if using a workerpool containing all processes, including the master process. The default worker pool for pmap doesn't contain process 1 it seems.

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool(procs()), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool(procs()), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

alandion avatar Jul 08 '19 01:07 alandion

@alandion. It also works if a worker pool containing all workers excluding the master process is created. It seems the trick is to create a worker pool when nesting??. (unlike the example given by @dkarrasch example here am sharing all workers)

using Distributed
addprocs() # this yields workers 2:5 on my machine
@everywhere function inner_process(task_id)
    task_id^2
end

@everywhere function outer_process(job_id)
    inner_task  = collect(1:2)
    pmap(inner_process, WorkerPool(workers()), inner_task)
end

function do_job(jobs_list) 
    pmap(outer_process, WorkerPool(workers()), jobs_list)
end

jobs_list = collect(1:10)
do_job(jobs_list)

For me it only hangs when an explicit WorkerPool or CachingPool isn't passed as an argument.

OkonSamuel avatar Jun 02 '20 19:06 OkonSamuel