Victor Anisimov
Victor Anisimov
[reproducer.tgz](https://github.com/user-attachments/files/16767462/reproducer.tgz) Once-sided communications between GPU pointers intermittently hang on Aurora. The attached reproducer runs on 96 nodes using six 16-node subcommunicators. One needs to run about 10-50 jobs in order...
Tests conducted on 144-node runs in the queue alcf_kmd_val show that one-sided communications in mpich 4.3.0rc2 are slower by 18% than those in the default Aurora MPICH. A single-file reproducer...
One-sided communications crash on very small window sizes when using PVC in implicit scaling mode (gpu_dev_compact.sh) with the following error mmap failed fd: 46 size: 969998336 mmap device to host:...