MPI.jl icon indicating copy to clipboard operation
MPI.jl copied to clipboard

Add `IReduce!` and `IAllreduce!`

Open Keluaa opened this issue 1 year ago • 6 comments

Added basic nonblocking reductions, alongside with some tests.

Keluaa avatar Mar 13 '24 09:03 Keluaa

Mmm... only the CUDA tests fail. I feared this was a Julia-side issue, but no, it is simply that MPI_Iallreduce is not CUDA-aware in OpenMPI.

This is a known issue (https://github.com/open-mpi/ompi/issues/9845), and it also happens with MPI_Ireduce (https://github.com/open-mpi/ompi/issues/12045). I was also able to reproduce it in C.

It is very surprising to me that the ROCm support apparently covers all non-blocking ops, but not CUDA.

What would be the best course of action? Merge anyway and let users stumble upon an unhelpful segfault? Or would a warning (if using OpenMPI + CUDA is loaded) be enough?

Keluaa avatar Jun 24 '24 14:06 Keluaa

Yeah we don't currently have a good mechanism to declare which operations can and can not take GPU memory. It seems even worse for OpenMPI since the set of supported operations is depending on whether or not UCX is used.

We certainly need to branch in the tests, but I don't think we have prior art for this.

@simonbyrne any ideas?

vchuravy avatar Jun 24 '24 17:06 vchuravy

Unfortunately it is probably implementation (and configuration) dependent, so I don't think we can provide a complete solution. My best suggestion would be to make it so the test suite can soft fail and report which operations are supported.

If you want something easy that does work, the simplest option is to use the regular blocking operation spawned on a separate thread:

task = Threads.@spawn MPI.Allreduce(....)
# other work
wait(task)

If your other work involves MPI ops, you will also need to MPI.Init(threadlevel=:multiple).

simonbyrne avatar Jun 25 '24 00:06 simonbyrne