Lux.jl icon indicating copy to clipboard operation
Lux.jl copied to clipboard

Distributed Training

Open avik-pal opened this issue 1 year ago • 0 comments

For an immediate solution, see https://github.com/avik-pal/FluxMPI.jl/.

With Package Extensions, this doesn't need to be in a separate package. Instead we can have backends which get activated via extensions for distributed training:

  • [ ] MPI Vanilla Version
  • [ ] CUDA-Aware MPI
  • [ ] NCCL for CuArrays. Needs https://github.com/JuliaGPU/NCCL.jl/issues/50

avik-pal avatar Feb 18 '24 22:02 avik-pal