Xi Luo
Xi Luo
## Pull Request Description The MPIDI_IPCI_try_lmt_isend function checks if the local rank is the same as the recv rank, and if it is, the IPC P2P falls back to POSIX...
When a large number of MPI_Get are called before an MPI_Win_fence on the GPU buffer across nodes, the program seems to hang. I will share the location of the reproducer...
## Pull Request Description Allow setting up topology-aware CVARs in collective tuning json file for bcast, reduce, and allreduce. ## Author Checklist * [ ] **Provide Description** Particularly focus on...
## Pull Request Description Use IPC P2P to move the data for intra-node GPU allgather and allgatherv. ## Author Checklist * [ ] **Provide Description** Particularly focus on _why_, not...
## Pull Request Description Add composition delta for bcast that can utilize the direct links between the GPUs in the same node. ## Author Checklist * [ ] **Provide Description**...
## Pull Request Description Fix defects found in Coverity Scan for MPICH-CH4 ## Author Checklist * [ ] **Provide Description** Particularly focus on _why_, not _what_. Reference background, issues, test...
## Pull Request Description Add new CVARs to allow MPICH to select GPU-optimized collective algorithms in JSON tuning file. Depends on: https://github.com/pmodels/mpich/pull/6781 ## Author Checklist * [ ] **Provide Description**...
In PR https://github.com/pmodels/mpich/pull/6451, ipc read bcast and alltoall requires MPL_gpu_imemcpy to move the data. But this function is only implemented in mpl_gpu_ze.c (no implementation of this function in CUDA or...
## Pull Request Description Allow release gather reduce operation to select different trees for small and large messages. Also clean the code. ## Author Checklist * [ ] **Provide Description**...