dash
dash copied to clipboard
Support for All to All Operations
DART should provide operations to support both blocking and nonblocking all-to-all operations.
- blocking:
MPI_AlltoallandMPI_Alltoallv - nonblocking:
MPI_IalltoallandMPI_Ialltoallv
Currently the only way to transpose an (N-)Array is to apply one-sided operations. This is, of course, not scalable with a large number of processes where logarithmic complexity is imperative, compared to linear complexity in the "naive" transpose.
An interesting use case for dart_Ialltoallv is dash::sort where all processors have to distribute local data to the corresponding target processors. This can be overlapped with local copy operations.
I think there was once of student from TUM who had a dart_alltoall implementation. I will look into that