Dynamic @distributed scheduling
It is great that @threads now supports (and uses by default) :dynamic scheduling. As a mirror to that, it would make quite a bit of sense if @distributed could handle dynamic scheduling. As I currently understand it, the best way to currently achieve this would be:
using Distributed
addprocs(whatever)
@everywhere function dynamicforeach(f, channel)
while isopen(channel)
try
task = take!(channel)
if isnothing(task)
close(channel)
else
f(task)
end
catch e
e isa InvalidStateException && e.state === :closed && break
rethrow()
end
end
end
tasks = 1:20
channel = RemoteChannel(() -> Channel{Union{eltype(tasks), Nothing}}(length(tasks)+1))
foreach(t -> put!(channel, t), tasks)
put!(channel, nothing) # Sentinel, end-task
@everywhere dynamicforeach(task -> println("did $task"), $channel)
It would be both much nicer, and I think rather appropriate if @distributed accepted a :static/:dynamic scheduling argument like @threads does, and allowed for the following instead of the above:
@distributed :dynamic for task in 1:20
println("did $task")
end
Going further, I think that :dynamic could actually be a sensible default for distributed scheduling.
Xref: JuliaLang/julia#17887 for having a consistent distributed/threaded API
Xref: JuliaLang/julia#41966 for tedious channel taking
Xref: JuliaLang/julia#48515 for iterating a RemoteChannel
Xref: JuliaLang/julia#33892 for maybe using a CachingPool too?
How would this be different than pmap?
In effect? Upon inspection, not much, since pmap already does dynamic scheduling*. However, I think this is could worth doing for the sake of discovery and API consistency. See the first X-ref.
*Maybe this is worth adding to the docs? I wasn't actually aware of this until I tried it.
For me @distributed has been something that I would love to remove, but of course we can't since that would be API breaking.
But Distributed.jl needs some love so I don't want to discourage you :)