mpich icon indicating copy to clipboard operation
mpich copied to clipboard

osu_gather significant slowdown from 8b to 1024b

Open longfei-austin opened this issue 3 months ago • 3 comments

The following comparison illustrates the issue (on current image):

mpiexec --np 3072 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-26/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gatherv -m 8:8 -i 1000 -x 100 -f -z -d sycl

# OSU MPI-SYCL Gatherv Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 8 119.12 1.61 124007.70 1000 117.50 123.89 193.58

mpiexec --np 3072 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-26/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gatherv -m 1024:1024 -i 1000 -x 100 -f -z -d sycl

# OSU MPI-SYCL Gatherv Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 1024 94969.64 678.15 116240.08 1000 88296.68 123401.15 441821.83

longfei-austin avatar Sep 30 '25 04:09 longfei-austin

mpiexec --np 12288 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/128/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-27/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gatherv -m 8:8 -i 1000 -x 100 -f -z -d sycl

# OSU MPI-SYCL Gatherv Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 8 102.16 1.70 288610.67 1000 89.81 109.57 122.85

mpiexec --np 12288 --ppn 96 --cpu-bind verbose,list:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:18:19:20:21:22:23:24:25:26:27:28:29:30:31:32:33:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:53:54:55:56:57:58:59:60:61:62:63:64:65:66:67:68:70:71:72:73:74:75:76:77:78:79:80:81:82:83:84:85:87:88:89:90:91:92:93:94:95:96:97:98:99:100:101:102 --gpu-bind verbose,list:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.0:0.1:0.1:0.1:0.1:0.1:0.1:0.1:0.1:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.0:1.1:1.1:1.1:1.1:1.1:1.1:1.1:1.1:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.0:2.1:2.1:2.1:2.1:2.1:2.1:2.1:2.1:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.0:3.1:3.1:3.1:3.1:3.1:3.1:3.1:3.1:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.0:4.1:4.1:4.1:4.1:4.1:4.1:4.1:4.1:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.0:5.1:5.1:5.1:5.1:5.1:5.1:5.1:5.1 /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/128/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-27/aurora/compute/PrgEnv-intel/BuildMPIcollective_93bceebc/binaries/osu_gatherv -m 1024:1024 -i 1000 -x 100 -f -z -d sycl

# OSU MPI-SYCL Gatherv Latency Test v7.5 # Datatype: MPI_CHAR. # Size Avg Latency(us) Min Latency(us) Max Latency(us) Iterations P50 Tail Lat(us) P90 Tail Lat(us) P99 Tail Lat(us) 1024 212708.27 8437.28 360412.57 1000 182955.37 339633.23 535149.98

longfei-austin avatar Sep 30 '25 04:09 longfei-austin

Please show the complete benchmark results from 1 byte to 4096 bytes

hzhou avatar Oct 07 '25 18:10 hzhou

We only tested 8B 1024B 4096B, all results can be found in this file /lus/flare/projects/Aurora_testing/mpi/osu_rfm/run_collective/32/gather-gather_persistent-gatherv-gatherv_persistent/stage/2025-09-12_18-37-26/aurora/compute/PrgEnv-intel/RunMPIcollective/rfm_job.out, feel free to take a look

longfei-austin avatar Oct 09 '25 19:10 longfei-austin