osu_gather/osu_scatter show large variance at 32 nodes 96 ppn
On both current and next-eval images, osu_gather exhibits large variance in latency such as the following:
mpiexec --np 3072 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-26/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gatherv -m 8:8 -i 1000 -x 100 -f -z -d sycl
# OSU MPI-SYCL Gatherv Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 8 119.12 1.61 124007.70 1000 117.50 123.89 193.58
module load mpich-config/collective-tuning/1024 ; MPIR_CVAR_CH4_PROGRESS_THROTTLE=1 ; mpiexec --np 3072 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-14_21-51-44/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gather -m 4096:4096 -i 1000 -x 100 -f -z ; unset MPIR_CVAR_CH4_PROGRESS_THROTTLE ; module unload mpich-config/collective-tuning/1024
# OSU MPI Gather Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 4096 12.31 0.76 1838.37 1000 12.25 12.36 12.67
This large variance also shows up in scatter.