Paul Coffman
Paul Coffman
I think the reality is this messaging performance is what it is and it really should be up to mpi tests like the osu microbenchmarks to identify bottlenecks in the...
This issue can be closed.
I think you covered all the angles Rob
Thanks for the suggestion @wkliao for the lammps pnetcdf case we are only running 12ppn, the data is 3d composition rank ordered so we would get some benefit from the...
@roblatham00 has been investigating MPI-IO collective aggregation performance as well, as have I, this is probably related. Is this only for high ppn or are you seeing this at say...
I took a look, taking the 4k message size and with the progress throttle on 512 nodes latency goes from 25 ms for 12 ppn to 327 ms for 96...
I have a version of IOR that when running against DAOS in MPI-IO mode with collective buffering disabled it writes discontiguous data all over the file, essentially every rank writes...
A couple more details at the messaging level on my enhanced IOR slowdown. So we are using erasure encoding in DAOS with 128k cells, so in the regular contiguous block...
Actually I was off by a factor of 2 on the enhanced IOR RMA gets, for 16ppn the DAOS server does 4 32K RMA gets from 4 clients, for 64ppn...
One more data point, with collective buffering ON for my enhanced ior on the read where each collective buffer distributes data to all the ranks there is a 100x slowdown...