Paul Coffman

Results 23 comments of Paul Coffman

Multi-threading would be of most benefit for the read I think, as the aggregators there are very computationally and network-injection bandwidth sensitive, with the data going from just a few...

On a rank designated as an aggregator there is 1 window over the collective buffer --- there is at most 1 aggregator for a give rank so to speak. Depending...

@raffenet I don't have any actual user requests for this, but I would imagine if a user was doing MPI with gpu buffers to avoid the overhead of copying back...

@roblatham00 yes write_at_all works with collective buffering enabled, however if I disable it with the romio_cb_write hint it fails with a bad address for me within IOR, however for some...

@roblatham00 @colleeneb So write_at_all with collective buffering works because the collective buffer is cpu memory on a host, the problem is with independant IO the file write will be given...

@roblatham00 yeah imo safest to use the collective buffer if the rank is an aggregator, if not then allocate the scratch buffer of the cb size on the cpu and...

Using @colleeneb 7108 mpich build this issue is resoved with the IOR test.

@raffenet advised me to NOT unset the collective tuning json vars, I did so: `pkcoff@x1921c3s4b0n0:/lus/gila/projects/Aurora_deployment/pkcoff/tarurundir> echo $MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE /soft/restricted/CNDA/updates/mpich/tuning/20230818-1024/CH4_coll_tuning.json pkcoff@x1921c3s4b0n0:/lus/gila/projects/Aurora_deployment/pkcoff/tarurundir> echo $MPIR_CVAR_CH4_POSIX_COLL_SELECTION_TUNING_JSON_FILE /soft/restricted/CNDA/updates/mpich/20231026/mpich-ofi-all-icc-default-pmix-gpu-drop20231026/json-files/POSIX_coll_tuning.json pkcoff@x1921c3s4b0n0:/lus/gila/projects/Aurora_deployment/pkcoff/tarurundir> echo $MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE /soft/restricted/CNDA/updates/mpich/tuning/20230818-1024/MPIR_Coll_tuning.json pkcoff@x1921c3s4b0n0:/lus/gila/projects/Aurora_deployment/pkcoff/tarurundir> ` But that...

At the advice of Ken I also unset all these: ` unset MPIR_CVAR_ENABLE_GPU unset MPIR_CVAR_BCAST_POSIX_INTRA_ALGORITHM unset MPIR_CVAR_ALLREDUCE_POSIX_INTRA_ALGORITHM unset MPIR_CVAR_BARRIER_POSIX_INTRA_ALGORITHM unset MPIR_CVAR_REDUCE_POSIX_INTRA_ALGORITHM ` But performance was actually a bit worse: `POSIX...

@raffenet There is similar behavior on aurora with daos, I only opened it on sunspot and gila because they were not available when I had time to open the issue....