Hui Zhou
Hui Zhou
test:mpich/custom netmod: ch4:ofi env: VERBOSE=1 env: FI_LOG_LEVEL=info env: MPITEST_IGNORE_OUTPUT=1
test:mpich/custom netmod: ch4:ofi env: V=1 env: FI_LOG_LEVEL=info env: MPITEST_IGNORE_OUTPUT=1
NOTES: Call path: `MPID_Get` -> `MPIDI_POSIX_do_get` -> `MPIR_Ilocalcopy_gpu` -> `MPL_gpu_fast_memcpy` -> `MPL_ze_mmap_device_pointer` -> `mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fds[0], 0);` -> `EINVAL` due to `size` too large. I believe...
The `scatter_ring_allgather` algorithm splits the "large" message into chunks then runs `P-1` rounds of ring circling. Obviously when `P` here is too large (512 x 12) we are over splitting...
https://github.com/pmodels/mpich/pull/7516 should fix the crash.
> Did you also check the performance? I got ~ 15GB/sec. Haven't investigated further yet.
@rithwiktom The original pipeline algorithm will be replaced in https://github.com/pmodels/mpich/pull/7529. Could you test and evaluate the performance of PR7529? You need set `MPIR_CVAR_CH4_OFI_EAGER_THRESHOLD` to enable the pipeline path in the...
I suspect https://github.com/pmodels/mpich/pull/7168/commits/05883b6a6c652bb1cbdcd81f68fa34c9f27e0445 is the cause for the performance change, at least for the low to medium range. The jump at `32768` is indicative of `MPIR_CVAR_GPU_FAST_COPY_MAX_SIZE_D2H` added in that commit
Thanks for letting us know. Best wishes!
Yes, the PR has been merged in to `main`.