DistributedArrays.jl Potential fix when master has compute intensive work and must schedule workers

Fixes issue #206 , please see the issue description for an explanation of the fix.

Jun 06 '19 13:06 raminammour

@andreasnoack Any thoughts on this?

May 19 '20 22:05 ViralBShah

It's a good observation and a pretty simple though not super pretty fix.

I'm wondering if we with the new multithreading can now just delegate all the scheduling to a separate task that won't block while the local work is being executed. I'd like to hear @vchuravy 's thoughts.

May 20 '20 13:05 andreasnoack

Looking at the code, the pattern

@sync for i in pids
    @async remotecall_fetch(**do_work**,i,...)

is common (and natural). So this may happen anywhere where **do_work** is heavy. I guess adding yield() in the correct places would work...

Or, at construction of DArray, by convention, have the id==myid() be last and preserve the invariant, pid[i] holds chunck i.

Cheers!

May 20 '20 13:05 raminammour

I think we need to carefully go through Distributed.jl and look at whether we can start using @spawn instead of @async, and then do the same for DistributedArrays.jl Won't be easy since a whole bunch of this code is based on cooperative tasking, and switching to parallelism will expose races.

I might be able to have a UROP look at this transition.

May 20 '20 13:05 vchuravy