Hui Zhou
Hui Zhou
We found the bug: * MPI_Gather with sufficient message size and sufficient number of processes, it goes to the gather_intra_binomial algorithm, which does message combination. The algorithm constructs a `struct`...
Thanks for creating the issue. MPICH itself do not have valgrind suppression file - we may used to have, but I don't see any now. The ones you found is...
Depend on what noise you are getting with Valgrind, for example, with ucx, I think you can grab the one from ucx distribution, If you don't really care about which...
We have not tested with the shm provider extensively, so I don't know. The proof is in the pudding. We currently use `-fsanitize=address` in our CI testing.
@longfei-austin 8 nodes at what PPN?
I am using 8 nodes and the `aurora_test` branch, running `osu_igather` - it works for me all the way up to 96 ppn.
I agree the event should observe the enqueued operations. I need examine the implementation. Do you have a reproducer?
Are you running at 96 PPN? I suspect the error is a libfabric I/O error, similar to the other issues when the network is overwhelmed and libcxi bails out.
Please show the complete benchmark results from 1 byte to 4096 bytes
Is this results considered acceptable ? ``` write 175063.57 119909.89 155586.28 19059.75 2735.37 1873.59 2431.04 297.81 6.85890 NA NA 0 16384 64 5 0 1 1 0 0 1 67108864...