Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Distribute the scheduler!

Open jpsamaroo opened this issue 4 years ago • 2 comments

This PR allows the scheduler to execute itself on all workers in the cluster. We first expand the notion of "thunk ID" to be per-worker, so that we can locally allocate unique IDs (and later locate where a thunk was created), and then allow the eager scheduler code to execute on all workers, instead of remotecall'ing to worker 1. We then allow thunks to be registered and scheduled locally (which should make recursive runtime-generated graphs vastly more efficient, no longer having to make a trip over the network). Finally, we implement local (and optionally remote?) work stealing (strictly for already-scheduled tasks, for the time being) to allow work to be kept balanced. The newly-available scheduler metrics on each worker will make it possible to optimize the choice of processor to steal from, although this can be left for later work.

Todo:

  • [x] Execute the eager scheduler on all workers
  • [ ] Implement local work stealing with ConcurrentCollection's WorkStealingDeque
  • [ ] Tests for @spawn/add_thunk! with thunks owned by other schedulers
  • [ ] Document new behavior
  • [ ] Validate the web dashboard shows remote scheduler data
  • [ ] Benchmarks

jpsamaroo avatar Dec 05 '21 16:12 jpsamaroo

Closes https://github.com/JuliaParallel/Dagger.jl/issues/165

jpsamaroo avatar Dec 05 '21 17:12 jpsamaroo

I'm considering lower-bounding Julia to 1.7 going forward, so that we can easily work with atomics and use packages like ConcurrentCollections.jl and AtomicArrays.jl. If we do this, we'll do a minor version bump to 0.15, and keep 0.14.x as the last set of versions supporting Julia 1.6. We'll backport bug fixes and critical performance work to that branch if they're reproducible on Julia 1.6, and I'm also happy to backport any features that can be safely used on Julia 1.6 (assuming it doesn't rely on more recent changes, such as in this PR).

jpsamaroo avatar Dec 07 '21 19:12 jpsamaroo

This is being developed on jps/dev, and will be re-posted when ready.

jpsamaroo avatar Apr 11 '24 22:04 jpsamaroo