Dagger.jl icon indicating copy to clipboard operation
Dagger.jl copied to clipboard

Add SPMD interface

Open jpsamaroo opened this issue 1 year ago • 0 comments

Similar to MPI's collective and P2P operations, this provides a convenient interface for writing code in the style of Single Program Multiple Data (SPMD). This can be quite convenient for implementing embarrassingly parallel algorithms, such as Distributed Data Parallel for ML.

Example usage:

fetch.(spmd(4; parallelize=:workers) do # Run one SPMD program per Distributed worker
  rank = spmd_rank() # from 1:4
  comm_size = spmd_size() # size of "comm" is 4
  all_ranks = spmd_exchange(rank) # pass in our own rank, get a vector of all ranks
  sum_of_ranks = only(spmd_reduce(+, [rank])) # reduces all ranks with +
end)

Todos:

  • [ ] Reconsider using RemoteChannels for communication
  • [ ] Add tags
  • [ ] Consider putting all spmd_* methods into their own module

jpsamaroo avatar Sep 27 '24 15:09 jpsamaroo