Mike Wilkins
Mike Wilkins
## Background information I was investigating an Allreduce performance issue for 64k processes @ 32kB message size. ### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git...
## Pull Request Description Adding the recursive multiplying allreduce algorithm. This algorithm achieves better performance than existing algorithms. They were designed with DOE's new exascale machines (Frontier, Aurora, El Capitan)...
Compiling on OpenSUSE 15.4, I got the following compile error: ``` In file included from ../src/mpid/ch4/shm/src/../posix/../ipc/src/ipc_p2p.h:14:0, from ../src/mpid/ch4/shm/src/../posix/posix_coll_gpu_ipc.h:61, from ../src/mpid/ch4/shm/src/../posix/posix_coll.h:12, from ../src/mpid/ch4/shm/src/../posix/shm_inline.h:16, from ../src/mpid/ch4/shm/src/shm_coll.h:10, from ../src/mpid/ch4/shm/src/shm_impl.h:18, from ../src/mpid/ch4/include/mpidch4.h:450, from ../src/mpid/ch4/include/mpidpost.h:10,...
## Pull Request Description This PR is an enhanced version of the progress changes from https://github.com/pmodels/mpich/pull/7368. https://github.com/pmodels/mpich/pull/7368 introduces `MPIR_CVAR_CH4_PROGRESS_THROTTLE`, which, when enabled, adds `usleep(1)` to the progress loop, preventing cache...
This algorithm achieves better performance than existing bcast algorithms for both small and large message sizes. The algorithm is based on the circulant graph abstraction and Jesper Larsson Traff's recent...
## Pull Request Description ## Author Checklist * [ ] **Provide Description** Particularly focus on _why_, not _what_. Reference background, issues, test failures, xfail entries, etc. * [ ] **Commits...