FBGEMM icon indicating copy to clipboard operation
FBGEMM copied to clipboard

ROCm symmem allreduce is slower than fbgemm car allreduce

Open xw285cornell opened this issue 7 months ago • 4 comments

python3 fbgemm_gpu/experimental/gen_ai/bench/comm_bench.py --num_iters=20 --export_csv

Running benchmark with 8 ranks [{'N': 1024, 'fbgemm_1shot_bwidth': 0.21609986687366958, 'fbgemm_1shot_time': 0.009477099776268006, 'fbgemm_2shot_bwidth': 0.16966774973717594, 'fbgemm_2shot_time': 0.012070649862289428, 'nccl_bwidth': 0.07660384301398733, 'nccl_time': 0.026734951138496398, 'symm_1shot_bwidth': 0.170175283451603, 'symm_1shot_time': 0.012034650146961211, 'symm_2shot_bwidth': 0.08925419349643432, 'symm_2shot_time': 0.02294570058584213}, {'N': 2048, 'fbgemm_1shot_bwidth': 0.43514287842525456, 'fbgemm_1shot_time': 0.00941300019621849, 'fbgemm_2shot_bwidth': 0.3551871171121755, 'fbgemm_2shot_time': 0.011531949788331986, 'nccl_bwidth': 0.1554789594685982, 'nccl_time': 0.02634440064430237, 'symm_1shot_bwidth': 0.35310191623477044, 'symm_1shot_time': 0.011600050330162048, 'symm_2shot_bwidth': 0.22222825536717503, 'symm_2shot_time': 0.01843149960041046}, {'N': 3072, 'fbgemm_1shot_bwidth': 0.525390915270513, 'fbgemm_1shot_time': 0.011694149672985077, 'fbgemm_2shot_bwidth': 0.4785140100526773, 'fbgemm_2shot_time': 0.012839749455451965, 'nccl_bwidth': 0.2297084378804756, 'nccl_time': 0.026746949553489684, 'symm_1shot_bwidth': 0.35896866105898656, 'symm_1shot_time': 0.017115700244903564, 'symm_2shot_bwidth': 0.2659291259274165, 'symm_2shot_time': 0.023103900253772736}, {'N': 5120, 'fbgemm_1shot_bwidth': 0.8990539548470523, 'fbgemm_1shot_time': 0.011389750242233276, 'fbgemm_2shot_bwidth': 0.8000218486792321, 'fbgemm_2shot_time': 0.012799650430679321, 'nccl_bwidth': 0.36599335910858, 'nccl_time': 0.027978649735450743, 'symm_1shot_bwidth': 0.5619842770453108, 'symm_1shot_time': 0.01822115033864975, 'symm_2shot_bwidth': 0.4456872955518811, 'symm_2shot_time': 0.022975750267505646}, {'N': 8192, 'fbgemm_1shot_bwidth': 1.2499093444780796, 'fbgemm_1shot_time': 0.013108150660991668, 'fbgemm_2shot_bwidth': 1.4509901141387298, 'fbgemm_2shot_time': 0.0112916000187397, 'nccl_bwidth': 0.5873969757073986, 'nccl_time': 0.02789255082607269, 'symm_1shot_bwidth': 0.8803987029137805, 'symm_1shot_time': 0.018609750270843505, 'symm_2shot_bwidth': 0.8761567539624059, 'symm_2shot_time': 0.018699850142002105}, {'N': 13312, 'fbgemm_1shot_bwidth': 2.070979859785417, 'fbgemm_1shot_time': 0.01285575032234192, 'fbgemm_2shot_bwidth': 1.9050684909879292, 'fbgemm_2shot_time': 0.013975350558757782, 'nccl_bwidth': 0.6573178024127491, 'nccl_time': 0.04050399959087372, 'symm_1shot_bwidth': 1.4397927865696751, 'symm_1shot_time': 0.018491549789905547, 'symm_2shot_bwidth': 1.0910693027959597, 'symm_2shot_time': 0.024401749670505523}, {'N': 22016, 'fbgemm_1shot_bwidth': 3.2838746891899673, 'fbgemm_1shot_time': 0.01340855062007904, 'fbgemm_2shot_bwidth': 2.9239949083295835, 'fbgemm_2shot_time': 0.015058849751949311, 'nccl_bwidth': 1.0965372068502555, 'nccl_time': 0.040155500173568726, 'symm_1shot_bwidth': 2.325527266142826, 'symm_1shot_time': 0.018934200704097747, 'symm_2shot_bwidth': 1.676240917984534, 'symm_2shot_time': 0.02626830041408539}, {'N': 36864, 'fbgemm_1shot_bwidth': 5.0449560710253785, 'fbgemm_1shot_time': 0.014614200592041016, 'fbgemm_2shot_bwidth': 4.492678697797537, 'fbgemm_2shot_time': 0.01641069948673248, 'nccl_bwidth': 1.8563382621278357, 'nccl_time': 0.039716899394989014, 'symm_1shot_bwidth': 3.6662722561857475, 'symm_1shot_time': 0.02010979950428009, 'symm_2shot_bwidth': 2.576136692833097, 'symm_2shot_time': 0.028619599342346192}, {'N': 60928, 'fbgemm_1shot_bwidth': 7.1004416767862955, 'fbgemm_1shot_time': 0.017161749303340912, 'fbgemm_2shot_bwidth': 7.395611001240528, 'fbgemm_2shot_time': 0.016476799547672272, 'nccl_bwidth': 3.0440142797610417, 'nccl_time': 0.04003134965896606, 'symm_1shot_bwidth': 5.5560065810869395, 'symm_1shot_time': 0.0219322994351387, 'symm_2shot_bwidth': 3.7497672823536177, 'symm_2shot_time': 0.03249695003032684}, {'N': 101888, 'fbgemm_1shot_bwidth': 7.52567135579981, 'fbgemm_1shot_time': 0.02707745134830475, 'fbgemm_2shot_bwidth': 12.252781139887329, 'fbgemm_2shot_time': 0.016630999743938446, 'nccl_bwidth': 5.143423070261001, 'nccl_time': 0.039618751406669615, 'symm_1shot_bwidth': 5.5234324500225656, 'symm_1shot_time': 0.036893001198768614, 'symm_2shot_bwidth': 6.164625373417774, 'symm_2shot_time': 0.03305569887161255}, {'N': 169472, 'fbgemm_1shot_bwidth': 11.572596536697198, 'fbgemm_1shot_time': 0.029288500547409058, 'fbgemm_2shot_bwidth': 19.819374223474767, 'fbgemm_2shot_time': 0.017101649940013886, 'nccl_bwidth': 8.457657216412821, 'nccl_time': 0.04007540047168732, 'symm_1shot_bwidth': 8.711914555772593, 'symm_1shot_time': 0.0389057993888855, 'symm_2shot_bwidth': 10.132173903266294, 'symm_2shot_time': 0.03345224857330322}, {'N': 282112, 'fbgemm_1shot_bwidth': 14.514267350468026, 'fbgemm_1shot_time': 0.03887374997138977, 'fbgemm_2shot_bwidth': 30.182333589369023, 'fbgemm_2shot_time': 0.018693849444389343, 'nccl_bwidth': 10.027476115773338, 'nccl_time': 0.05626779794692993, 'symm_1shot_bwidth': 11.01165542448017, 'symm_1shot_time': 0.05123879909515381, 'symm_2shot_bwidth': 16.37919554493729, 'symm_2shot_time': 0.03444760143756866}, {'N': 470016, 'fbgemm_1shot_bwidth': 17.45634498163174, 'fbgemm_1shot_time': 0.05385044813156128, 'fbgemm_2shot_bwidth': 41.83674934540085, 'fbgemm_2shot_time': 0.02246904969215393, 'nccl_bwidth': 15.87574104439456, 'nccl_time': 0.0592118501663208, 'symm_1shot_bwidth': 12.78648046179834, 'symm_1shot_time': 0.07351765036582947, 'symm_2shot_bwidth': 24.260424797888245, 'symm_2shot_time': 0.038747549057006836}, {'N': 783360, 'fbgemm_1shot_bwidth': 20.50852472580541, 'fbgemm_1shot_time': 0.07639359831809997, 'fbgemm_2shot_bwidth': 53.66150443894756, 'fbgemm_2shot_time': 0.029196348786354066, 'nccl_bwidth': 25.079416611998376, 'nccl_time': 0.06247035264968872, 'symm_1shot_bwidth': 15.194860304031323, 'symm_1shot_time': 0.10310854911804199, 'symm_2shot_bwidth': 33.47785294988142, 'symm_2shot_time': 0.046798700094223024}, {'N': 1305600, 'fbgemm_1shot_bwidth': 22.18470798606954, 'fbgemm_1shot_time': 0.11770269870758057, 'fbgemm_2shot_bwidth': 68.0157616602996, 'fbgemm_2shot_time': 0.03839110136032105, 'nccl_bwidth': 38.62065020183232, 'nccl_time': 0.06761149764060974, 'symm_1shot_bwidth': 16.140472148493032, 'symm_1shot_time': 0.1617796540260315, 'symm_2shot_bwidth': 43.0650873942165, 'symm_2shot_time': 0.06063380241394043}, {'N': 2175488, 'fbgemm_1shot_bwidth': 24.031862889072666, 'fbgemm_1shot_time': 0.18105030059814453, 'fbgemm_2shot_bwidth': 82.54413751013578, 'fbgemm_2shot_time': 0.05271090269088745, 'nccl_bwidth': 53.38840309434399, 'nccl_time': 0.08149664998054504, 'symm_1shot_bwidth': 17.47448681082015, 'symm_1shot_time': 0.24899020195007324, 'symm_2shot_bwidth': 55.25550585951041, 'symm_2shot_time': 0.07874284982681275}, {'N': 3624960, 'fbgemm_1shot_bwidth': 25.906921647574368, 'fbgemm_1shot_time': 0.2798449039459229, 'fbgemm_2shot_bwidth': 95.88503412616686, 'fbgemm_2shot_time': 0.07561054825782776, 'nccl_bwidth': 72.02587967759287, 'nccl_time': 0.10065715312957764, 'symm_1shot_bwidth': 18.385170863194713, 'symm_1shot_time': 0.3943351984024048, 'symm_2shot_bwidth': 68.79806978521225, 'symm_2shot_time': 0.10537970066070557}, {'N': 6041088, 'fbgemm_1shot_bwidth': 26.79060979365397, 'fbgemm_1shot_time': 0.4509854793548584, 'fbgemm_2shot_bwidth': 103.35161348319318, 'fbgemm_2shot_time': 0.11690360307693481, 'nccl_bwidth': 94.66975802722887, 'nccl_time': 0.12762445211410522, 'symm_1shot_bwidth': 18.768079801355, 'symm_1shot_time': 0.6437619686126709, 'symm_2shot_bwidth': 81.9476150881304, 'symm_2shot_time': 0.14743779897689818}, {'N': 10067456, 'fbgemm_1shot_bwidth': 0.12319780093105741, 'fbgemm_1shot_time': 163.43564453125, 'fbgemm_2shot_bwidth': 108.34488796279796, 'fbgemm_2shot_time': 0.185840904712677, 'nccl_bwidth': 106.14290873427711, 'nccl_time': 0.18969625234603882, 'symm_1shot_bwidth': 19.017884088301123, 'symm_1shot_time': 1.0587356567382813, 'symm_2shot_bwidth': 84.9453555333993, 'symm_2shot_time': 0.23703370094299317}, {'N': 16777216, 'fbgemm_1shot_bwidth': 28.02392784267898, 'fbgemm_1shot_time': 1.1973493576049805, 'fbgemm_2shot_bwidth': 111.08473038016218, 'fbgemm_2shot_time': 0.3020616054534912, 'nccl_bwidth': 125.85747263716175, 'nccl_time': 0.26660659313201907, 'symm_1shot_bwidth': 19.2322950070967, 'symm_1shot_time': 1.744692039489746, 'symm_2shot_bwidth': 90.17912823748816, 'symm_2shot_time': 0.37208645343780516}]

xw285cornell avatar May 03 '25 19:05 xw285cornell

cc: @q10 @ionuthristodorescu

spcyppt avatar Jun 11 '25 22:06 spcyppt

With https://github.com/pytorch/pytorch/pull/155587, it appears 1 shot is better, but 2 shot still seems worse at larger size.

[{'N': 1024, 'fbgemm_1shot_bwidth': 0.21410948217794498, 'fbgemm_1shot_time': 0.00956519991159439, 'fbgemm_2shot_bwidth': 0.15819741225093126, 'fbgemm_2shot_time': 0.01294585019350052, 'nccl_bwidth': 0.00892791974248316, 'nccl_time': 0.22939274311065674, 'symm_1shot_bwidth': 0.18408409625592814, 'symm_1shot_time': 0.01112534999847412, 'symm_2shot_bwidth': 0.1216350673029192, 'symm_2shot_time': 0.01683724969625473}, {'N': 2048, 'fbgemm_1shot_bwidth': 0.435605656520708, 'fbgemm_1shot_time': 0.0094030000269413, 'fbgemm_2shot_bwidth': 0.377966127090978, 'fbgemm_2shot_time': 0.010836949944496155, 'nccl_bwidth': 0.018088785429888312, 'nccl_time': 0.22643864154815674, 'symm_1shot_bwidth': 0.4028324031133285, 'symm_1shot_time': 0.010168000310659408, 'symm_2shot_bwidth': 0.2644739004126182, 'symm_2shot_time': 0.015487350523471832}, {'N': 3072, 'fbgemm_1shot_bwidth': 0.5247604253096506, 'fbgemm_1shot_time': 0.011708199977874756, 'fbgemm_2shot_bwidth': 0.5143789875182428, 'fbgemm_2shot_time': 0.011944500356912613, 'nccl_bwidth': 0.0273275014170218, 'nccl_time': 0.22482845783233643, 'symm_1shot_bwidth': 0.633702561151026, 'symm_1shot_time': 0.009695400297641755, 'symm_2shot_bwidth': 0.40621754460550286, 'symm_2shot_time': 0.01512490063905716}, {'N': 5120, 'fbgemm_1shot_bwidth': 0.9015909848086928, 'fbgemm_1shot_time': 0.011357700079679489, 'fbgemm_2shot_bwidth': 0.7702520469054789, 'fbgemm_2shot_time': 0.01329434961080551, 'nccl_bwidth': 0.04518199325536584, 'nccl_time': 0.22663896083831786, 'symm_1shot_bwidth': 0.9621618667090439, 'symm_1shot_time': 0.010642699897289276, 'symm_2shot_bwidth': 0.6603299054852385, 'symm_2shot_time': 0.015507400035858154}, {'N': 8192, 'fbgemm_1shot_bwidth': 1.2689168490017615, 'fbgemm_1shot_time': 0.012911799550056457, 'fbgemm_2shot_bwidth': 1.4535647525525521, 'fbgemm_2shot_time': 0.011271599680185318, 'nccl_bwidth': 0.07182595653890322, 'nccl_time': 0.22810695171356202, 'symm_1shot_bwidth': 1.2899264412719538, 'symm_1shot_time': 0.012701499462127685, 'symm_2shot_bwidth': 1.0653315309374043, 'symm_2shot_time': 0.015379250049591064}, {'N': 13312, 'fbgemm_1shot_bwidth': 2.070979859785417, 'fbgemm_1shot_time': 0.01285575032234192, 'fbgemm_2shot_bwidth': 1.9984461626503638, 'fbgemm_2shot_time': 0.013322350382804871, 'nccl_bwidth': 0.11167167347582056, 'nccl_time': 0.2384131908416748, 'symm_1shot_bwidth': 2.442781757687772, 'symm_1shot_time': 0.010899049788713455, 'symm_2shot_bwidth': 1.5850260903710918, 'symm_2shot_time': 0.016797199845314026}, {'N': 22016, 'fbgemm_1shot_bwidth': 3.191406693442649, 'fbgemm_1shot_time': 0.013797050714492798, 'fbgemm_2shot_bwidth': 2.9896085883690335, 'fbgemm_2shot_time': 0.014728349447250367, 'nccl_bwidth': 0.18511071819917277, 'nccl_time': 0.23786845207214355, 'symm_1shot_bwidth': 3.4964485739607403, 'symm_1shot_time': 0.012593349814414978, 'symm_2shot_bwidth': 2.395210951742409, 'symm_2shot_time': 0.01838334947824478}, {'N': 36864, 'fbgemm_1shot_bwidth': 5.001781045860826, 'fbgemm_1shot_time': 0.014740349352359771, 'fbgemm_2shot_bwidth': 4.52471686467392, 'fbgemm_2shot_time': 0.016294500231742857, 'nccl_bwidth': 0.3127758913047964, 'nccl_time': 0.23572149276733398, 'symm_1shot_bwidth': 5.831319544318621, 'symm_1shot_time': 0.012643450498580932, 'symm_2shot_bwidth': 3.4762190385060032, 'symm_2shot_time': 0.021209250390529632}, {'N': 60928, 'fbgemm_1shot_bwidth': 6.980711857251513, 'fbgemm_1shot_time': 0.01745609939098358, 'fbgemm_2shot_bwidth': 7.494933471871741, 'fbgemm_2shot_time': 0.016258449852466585, 'nccl_bwidth': 0.5029583632111209, 'nccl_time': 0.24227850437164306, 'symm_1shot_bwidth': 8.53709963023381, 'symm_1shot_time': 0.014273700118064881, 'symm_2shot_bwidth': 4.689682395127824, 'symm_2shot_time': 0.025983849167823793}, {'N': 101888, 'fbgemm_1shot_bwidth': 7.464404592961836, 'fbgemm_1shot_time': 0.027299699187278748, 'fbgemm_2shot_bwidth': 12.301674256992028, 'fbgemm_2shot_time': 0.01656489968299866, 'nccl_bwidth': 0.6101788404025775, 'nccl_time': 0.33396110534667967, 'symm_1shot_bwidth': 9.138429197038773, 'symm_1shot_time': 0.022298799455165864, 'symm_2shot_bwidth': 7.647297566733347, 'symm_2shot_time': 0.026646798849105834}, {'N': 169472, 'fbgemm_1shot_bwidth': 11.404941086191034, 'fbgemm_1shot_time': 0.029719048738479616, 'fbgemm_2shot_bwidth': 19.831027414798232, 'fbgemm_2shot_time': 0.017091600596904753, 'nccl_bwidth': 1.430982497435245, 'nccl_time': 0.2368610382080078, 'symm_1shot_bwidth': 14.378757908386548, 'symm_1shot_time': 0.023572550714015962, 'symm_2shot_bwidth': 12.463215942433528, 'symm_2shot_time': 0.027195549011230467}, {'N': 282112, 'fbgemm_1shot_bwidth': 14.543524173123425, 'fbgemm_1shot_time': 0.03879554867744446, 'fbgemm_2shot_bwidth': 30.037638389428498, 'fbgemm_2shot_time': 0.018783900141716003, 'nccl_bwidth': 2.2154963709394226, 'nccl_time': 0.25467159748077395, 'symm_1shot_bwidth': 17.715038570162292, 'symm_1shot_time': 0.03185000121593475, 'symm_2shot_bwidth': 19.95907218231074, 'symm_2shot_time': 0.028269049525260926}, {'N': 470016, 'fbgemm_1shot_bwidth': 17.3768829065395, 'fbgemm_1shot_time': 0.05409669876098633, 'fbgemm_2shot_bwidth': 41.83302622392235, 'fbgemm_2shot_time': 0.022471049427986146, 'nccl_bwidth': 3.654225521141287, 'nccl_time': 0.25724520683288576, 'symm_1shot_bwidth': 20.696091099524136, 'symm_1shot_time': 0.045420750975608826, 'symm_2shot_bwidth': 29.214953272198077, 'symm_2shot_time': 0.03217639923095703}, {'N': 783360, 'fbgemm_1shot_bwidth': 20.312060621453913, 'fbgemm_1shot_time': 0.07713249921798707, 'fbgemm_2shot_bwidth': 53.6137555543816, 'fbgemm_2shot_time': 0.02922235131263733, 'nccl_bwidth': 5.462036764520574, 'nccl_time': 0.2868380546569824, 'symm_1shot_bwidth': 23.282790841362893, 'symm_1shot_time': 0.06729090213775635, 'symm_2shot_bwidth': 38.93483031670804, 'symm_2shot_time': 0.0402395486831665}, {'N': 1305600, 'fbgemm_1shot_bwidth': 22.088146345860988, 'fbgemm_1shot_time': 0.11821725368499755, 'fbgemm_2shot_bwidth': 67.74032880217506, 'fbgemm_2shot_time': 0.03854719996452331, 'nccl_bwidth': 9.81302195155387, 'nccl_time': 0.2660953998565674, 'symm_1shot_bwidth': 24.63434591160876, 'symm_1shot_time': 0.1059983491897583, 'symm_2shot_bwidth': 47.98667945094382, 'symm_2shot_time': 0.05441510081291199}, {'N': 2175488, 'fbgemm_1shot_bwidth': 24.177125264796143, 'fbgemm_1shot_time': 0.1799625039100647, 'fbgemm_2shot_bwidth': 82.41905627413288, 'fbgemm_2shot_time': 0.052790898084640506, 'nccl_bwidth': 15.980847519928208, 'nccl_time': 0.272261905670166, 'symm_1shot_bwidth': 26.66606275980422, 'symm_1shot_time': 0.16316529512405395, 'symm_2shot_bwidth': 62.915832494552824, 'symm_2shot_time': 0.06915550231933594}, {'N': 3624960, 'fbgemm_1shot_bwidth': 25.98376910450393, 'fbgemm_1shot_time': 0.2790172576904297, 'fbgemm_2shot_bwidth': 95.69512012311061, 'fbgemm_2shot_time': 0.0757606029510498, 'nccl_bwidth': 25.017654358143027, 'nccl_time': 0.2897921562194824, 'symm_1shot_bwidth': 28.15945836823435, 'symm_1shot_time': 0.2574594974517822, 'symm_2shot_bwidth': 68.58962920518601, 'symm_2shot_time': 0.10569994449615479}, {'N': 6041088, 'fbgemm_1shot_bwidth': 26.71827890020251, 'fbgemm_1shot_time': 0.4522063732147217, 'fbgemm_2shot_bwidth': 103.91254456897357, 'fbgemm_2shot_time': 0.11627254486083985, 'nccl_bwidth': 38.4748861958845, 'nccl_time': 0.3140275955200195, 'symm_1shot_bwidth': 28.966825046247966, 'symm_1shot_time': 0.417103910446167, 'symm_2shot_bwidth': 83.28267882959946, 'symm_2shot_time': 0.14507429599761962}, {'N': 10067456, 'fbgemm_1shot_bwidth': 27.402095146598, 'fbgemm_1shot_time': 0.7347946166992188, 'fbgemm_2shot_bwidth': 107.90202458416417, 'fbgemm_2shot_time': 0.1866036534309387, 'nccl_bwidth': 56.36254192305505, 'nccl_time': 0.3572392463684082, 'symm_1shot_bwidth': 29.49935177712933, 'symm_1shot_time': 0.6825543880462647, 'symm_2shot_bwidth': 91.28144632017552, 'symm_2shot_time': 0.2205805540084839}, {'N': 16777216, 'fbgemm_1shot_bwidth': 28.00472510174091, 'fbgemm_1shot_time': 1.1981703758239746, 'fbgemm_2shot_bwidth': 111.44402514695923, 'fbgemm_2shot_time': 0.301087760925293, 'nccl_bwidth': 77.60959523801877, 'nccl_time': 0.43234901428222655, 'symm_1shot_bwidth': 29.939012236481993, 'symm_1shot_time': 1.1207594871520996, 'symm_2shot_bwidth': 99.506747938792, 'symm_2shot_time': 0.33720760345458983}]

xw285cornell avatar Jun 15 '25 07:06 xw285cornell

With pytorch/pytorch#155587, it appears 1 shot is better, but 2 shot still seems worse at larger size.

[{'N': 1024, 'fbgemm_1shot_bwidth': 0.21410948217794498, 'fbgemm_1shot_time': 0.00956519991159439, 'fbgemm_2shot_bwidth': 0.15819741225093126, 'fbgemm_2shot_time': 0.01294585019350052, 'nccl_bwidth': 0.00892791974248316, 'nccl_time': 0.22939274311065674, 'symm_1shot_bwidth': 0.18408409625592814, 'symm_1shot_time': 0.01112534999847412, 'symm_2shot_bwidth': 0.1216350673029192, 'symm_2shot_time': 0.01683724969625473}, {'N': 2048, 'fbgemm_1shot_bwidth': 0.435605656520708, 'fbgemm_1shot_time': 0.0094030000269413, 'fbgemm_2shot_bwidth': 0.377966127090978, 'fbgemm_2shot_time': 0.010836949944496155, 'nccl_bwidth': 0.018088785429888312, 'nccl_time': 0.22643864154815674, 'symm_1shot_bwidth': 0.4028324031133285, 'symm_1shot_time': 0.010168000310659408, 'symm_2shot_bwidth': 0.2644739004126182, 'symm_2shot_time': 0.015487350523471832}, {'N': 3072, 'fbgemm_1shot_bwidth': 0.5247604253096506, 'fbgemm_1shot_time': 0.011708199977874756, 'fbgemm_2shot_bwidth': 0.5143789875182428, 'fbgemm_2shot_time': 0.011944500356912613, 'nccl_bwidth': 0.0273275014170218, 'nccl_time': 0.22482845783233643, 'symm_1shot_bwidth': 0.633702561151026, 'symm_1shot_time': 0.009695400297641755, 'symm_2shot_bwidth': 0.40621754460550286, 'symm_2shot_time': 0.01512490063905716}, {'N': 5120, 'fbgemm_1shot_bwidth': 0.9015909848086928, 'fbgemm_1shot_time': 0.011357700079679489, 'fbgemm_2shot_bwidth': 0.7702520469054789, 'fbgemm_2shot_time': 0.01329434961080551, 'nccl_bwidth': 0.04518199325536584, 'nccl_time': 0.22663896083831786, 'symm_1shot_bwidth': 0.9621618667090439, 'symm_1shot_time': 0.010642699897289276, 'symm_2shot_bwidth': 0.6603299054852385, 'symm_2shot_time': 0.015507400035858154}, {'N': 8192, 'fbgemm_1shot_bwidth': 1.2689168490017615, 'fbgemm_1shot_time': 0.012911799550056457, 'fbgemm_2shot_bwidth': 1.4535647525525521, 'fbgemm_2shot_time': 0.011271599680185318, 'nccl_bwidth': 0.07182595653890322, 'nccl_time': 0.22810695171356202, 'symm_1shot_bwidth': 1.2899264412719538, 'symm_1shot_time': 0.012701499462127685, 'symm_2shot_bwidth': 1.0653315309374043, 'symm_2shot_time': 0.015379250049591064}, {'N': 13312, 'fbgemm_1shot_bwidth': 2.070979859785417, 'fbgemm_1shot_time': 0.01285575032234192, 'fbgemm_2shot_bwidth': 1.9984461626503638, 'fbgemm_2shot_time': 0.013322350382804871, 'nccl_bwidth': 0.11167167347582056, 'nccl_time': 0.2384131908416748, 'symm_1shot_bwidth': 2.442781757687772, 'symm_1shot_time': 0.010899049788713455, 'symm_2shot_bwidth': 1.5850260903710918, 'symm_2shot_time': 0.016797199845314026}, {'N': 22016, 'fbgemm_1shot_bwidth': 3.191406693442649, 'fbgemm_1shot_time': 0.013797050714492798, 'fbgemm_2shot_bwidth': 2.9896085883690335, 'fbgemm_2shot_time': 0.014728349447250367, 'nccl_bwidth': 0.18511071819917277, 'nccl_time': 0.23786845207214355, 'symm_1shot_bwidth': 3.4964485739607403, 'symm_1shot_time': 0.012593349814414978, 'symm_2shot_bwidth': 2.395210951742409, 'symm_2shot_time': 0.01838334947824478}, {'N': 36864, 'fbgemm_1shot_bwidth': 5.001781045860826, 'fbgemm_1shot_time': 0.014740349352359771, 'fbgemm_2shot_bwidth': 4.52471686467392, 'fbgemm_2shot_time': 0.016294500231742857, 'nccl_bwidth': 0.3127758913047964, 'nccl_time': 0.23572149276733398, 'symm_1shot_bwidth': 5.831319544318621, 'symm_1shot_time': 0.012643450498580932, 'symm_2shot_bwidth': 3.4762190385060032, 'symm_2shot_time': 0.021209250390529632}, {'N': 60928, 'fbgemm_1shot_bwidth': 6.980711857251513, 'fbgemm_1shot_time': 0.01745609939098358, 'fbgemm_2shot_bwidth': 7.494933471871741, 'fbgemm_2shot_time': 0.016258449852466585, 'nccl_bwidth': 0.5029583632111209, 'nccl_time': 0.24227850437164306, 'symm_1shot_bwidth': 8.53709963023381, 'symm_1shot_time': 0.014273700118064881, 'symm_2shot_bwidth': 4.689682395127824, 'symm_2shot_time': 0.025983849167823793}, {'N': 101888, 'fbgemm_1shot_bwidth': 7.464404592961836, 'fbgemm_1shot_time': 0.027299699187278748, 'fbgemm_2shot_bwidth': 12.301674256992028, 'fbgemm_2shot_time': 0.01656489968299866, 'nccl_bwidth': 0.6101788404025775, 'nccl_time': 0.33396110534667967, 'symm_1shot_bwidth': 9.138429197038773, 'symm_1shot_time': 0.022298799455165864, 'symm_2shot_bwidth': 7.647297566733347, 'symm_2shot_time': 0.026646798849105834}, {'N': 169472, 'fbgemm_1shot_bwidth': 11.404941086191034, 'fbgemm_1shot_time': 0.029719048738479616, 'fbgemm_2shot_bwidth': 19.831027414798232, 'fbgemm_2shot_time': 0.017091600596904753, 'nccl_bwidth': 1.430982497435245, 'nccl_time': 0.2368610382080078, 'symm_1shot_bwidth': 14.378757908386548, 'symm_1shot_time': 0.023572550714015962, 'symm_2shot_bwidth': 12.463215942433528, 'symm_2shot_time': 0.027195549011230467}, {'N': 282112, 'fbgemm_1shot_bwidth': 14.543524173123425, 'fbgemm_1shot_time': 0.03879554867744446, 'fbgemm_2shot_bwidth': 30.037638389428498, 'fbgemm_2shot_time': 0.018783900141716003, 'nccl_bwidth': 2.2154963709394226, 'nccl_time': 0.25467159748077395, 'symm_1shot_bwidth': 17.715038570162292, 'symm_1shot_time': 0.03185000121593475, 'symm_2shot_bwidth': 19.95907218231074, 'symm_2shot_time': 0.028269049525260926}, {'N': 470016, 'fbgemm_1shot_bwidth': 17.3768829065395, 'fbgemm_1shot_time': 0.05409669876098633, 'fbgemm_2shot_bwidth': 41.83302622392235, 'fbgemm_2shot_time': 0.022471049427986146, 'nccl_bwidth': 3.654225521141287, 'nccl_time': 0.25724520683288576, 'symm_1shot_bwidth': 20.696091099524136, 'symm_1shot_time': 0.045420750975608826, 'symm_2shot_bwidth': 29.214953272198077, 'symm_2shot_time': 0.03217639923095703}, {'N': 783360, 'fbgemm_1shot_bwidth': 20.312060621453913, 'fbgemm_1shot_time': 0.07713249921798707, 'fbgemm_2shot_bwidth': 53.6137555543816, 'fbgemm_2shot_time': 0.02922235131263733, 'nccl_bwidth': 5.462036764520574, 'nccl_time': 0.2868380546569824, 'symm_1shot_bwidth': 23.282790841362893, 'symm_1shot_time': 0.06729090213775635, 'symm_2shot_bwidth': 38.93483031670804, 'symm_2shot_time': 0.0402395486831665}, {'N': 1305600, 'fbgemm_1shot_bwidth': 22.088146345860988, 'fbgemm_1shot_time': 0.11821725368499755, 'fbgemm_2shot_bwidth': 67.74032880217506, 'fbgemm_2shot_time': 0.03854719996452331, 'nccl_bwidth': 9.81302195155387, 'nccl_time': 0.2660953998565674, 'symm_1shot_bwidth': 24.63434591160876, 'symm_1shot_time': 0.1059983491897583, 'symm_2shot_bwidth': 47.98667945094382, 'symm_2shot_time': 0.05441510081291199}, {'N': 2175488, 'fbgemm_1shot_bwidth': 24.177125264796143, 'fbgemm_1shot_time': 0.1799625039100647, 'fbgemm_2shot_bwidth': 82.41905627413288, 'fbgemm_2shot_time': 0.052790898084640506, 'nccl_bwidth': 15.980847519928208, 'nccl_time': 0.272261905670166, 'symm_1shot_bwidth': 26.66606275980422, 'symm_1shot_time': 0.16316529512405395, 'symm_2shot_bwidth': 62.915832494552824, 'symm_2shot_time': 0.06915550231933594}, {'N': 3624960, 'fbgemm_1shot_bwidth': 25.98376910450393, 'fbgemm_1shot_time': 0.2790172576904297, 'fbgemm_2shot_bwidth': 95.69512012311061, 'fbgemm_2shot_time': 0.0757606029510498, 'nccl_bwidth': 25.017654358143027, 'nccl_time': 0.2897921562194824, 'symm_1shot_bwidth': 28.15945836823435, 'symm_1shot_time': 0.2574594974517822, 'symm_2shot_bwidth': 68.58962920518601, 'symm_2shot_time': 0.10569994449615479}, {'N': 6041088, 'fbgemm_1shot_bwidth': 26.71827890020251, 'fbgemm_1shot_time': 0.4522063732147217, 'fbgemm_2shot_bwidth': 103.91254456897357, 'fbgemm_2shot_time': 0.11627254486083985, 'nccl_bwidth': 38.4748861958845, 'nccl_time': 0.3140275955200195, 'symm_1shot_bwidth': 28.966825046247966, 'symm_1shot_time': 0.417103910446167, 'symm_2shot_bwidth': 83.28267882959946, 'symm_2shot_time': 0.14507429599761962}, {'N': 10067456, 'fbgemm_1shot_bwidth': 27.402095146598, 'fbgemm_1shot_time': 0.7347946166992188, 'fbgemm_2shot_bwidth': 107.90202458416417, 'fbgemm_2shot_time': 0.1866036534309387, 'nccl_bwidth': 56.36254192305505, 'nccl_time': 0.3572392463684082, 'symm_1shot_bwidth': 29.49935177712933, 'symm_1shot_time': 0.6825543880462647, 'symm_2shot_bwidth': 91.28144632017552, 'symm_2shot_time': 0.2205805540084839}, {'N': 16777216, 'fbgemm_1shot_bwidth': 28.00472510174091, 'fbgemm_1shot_time': 1.1981703758239746, 'fbgemm_2shot_bwidth': 111.44402514695923, 'fbgemm_2shot_time': 0.301087760925293, 'nccl_bwidth': 77.60959523801877, 'nccl_time': 0.43234901428222655, 'symm_1shot_bwidth': 29.939012236481993, 'symm_1shot_time': 1.1207594871520996, 'symm_2shot_bwidth': 99.506747938792, 'symm_2shot_time': 0.33720760345458983}]

This still seems like a good reason to merge PR https://github.com/pytorch/pytorch/pull/155587, isn't it? We can address the 2-shot improvement in a later PR. We want to get this in before PyTorch 2.8 branch cut (6/20). Cc @pragupta @jeffdaily

jithunnair-amd avatar Jun 17 '25 19:06 jithunnair-amd

BTW that PR should be restricted to MI300 and before. From MI350 we have CVT instructions added to ISA.

amd-hhashemi avatar Jun 18 '25 23:06 amd-hhashemi