MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Allclose

Open hieule88 opened this issue 10 months ago • 0 comments

  • Added Allclose.

  • Added driver test and gtest for Allclose.

  • New API is guarded by MIOPEN_BETA_API macro.

  • Average over all cases:

  • Allclose

Type Allclose
float16 6.04
float32 6.06
bfloat16 6.27

FP16
op_name dtype input_size rocm_kernel_avg kernel_duration improvement over rocm
AllClose float16 [50 100] 85599 24106 10.96946818
AllClose float16 [100 50] 80351 22186 8.721085369
AllClose float16 [100 100] 79855 22791 10.78873239
AllClose float16 [100 300] 72655 22097 8.67873467
AllClose float16 [300 100] 71951 22969 8.561713614
AllClose float16 [200 300] 76288 23591 10.21737103
AllClose float16 [205 350] 70287 30382 6.970377197
AllClose float16 [350 105] 80367 23644 9.371595331
AllClose float16 [405 200] 72704 29671 6.68282161
AllClose float16 [10 10 10] 82863 24746 5.899903015
AllClose float16 [10 10 30] 73071 26077 6.860221651
AllClose float16 [10 30 10] 71087 24157 9.952146376
AllClose float16 [30 10 10] 71264 25437 8.543067186
AllClose float16 [30 30 30] 64303 25064 7.691589531
AllClose float16 [50 100 50] 69951 32085 6.378993299
AllClose float16 [100 50 100] 74735 35623 4.566010723
AllClose float16 [100 100 100] 104239 49897 3.38583482
AllClose float16 [100 100 300] 192270 110513 2.083791047
AllClose float16 [300 100 100] 189790 110193 1.765756446
AllClose float16 [10 10 10 10] 82447 23713 6.867372327
AllClose float16 [10 10 10 30] 73967 24157 6.524609844
AllClose float16 [30 10 10 10] 73391 26913 5.638873407
AllClose float16 [30 30 30 30] 80240 44120 5.412103354
AllClose float16 [50 100 50 100] 1074263 768978 1.403990751
AllClose float16 [100 50 100 50] 1080566 770099 1.410049877
AllClose float16 [100 100 100 100] 4089980 3016030 1.357847236
AllClose float16 [100 100 300 100] 12305426 12653600 0.972901467
AllClose float16 [300 100 100 100] 12280144 9020920 1.361886814

FP32
op_name dtype input_size rocm_kernel_avg kernel_duration improvement over rocm
AllClose float32 [50 100] 80160 23004 9.862545644
AllClose float32 [100 50] 75471 23395 10.5820047
AllClose float32 [100 100] 75887 23164 7.548868935
AllClose float32 [100 300] 68943 23182 9.169096713
AllClose float32 [300 100] 69183 23946 8.736239873
AllClose float32 [200 300] 74944 23484 7.446005791
AllClose float32 [205 350] 68959 29351 7.368948247
AllClose float32 [350 105] 77599 24249 8.441007877
AllClose float32 [405 200] 70319 30400 5.480460526
AllClose float32 [10 10 10] 75583 24088 8.289521754
AllClose float32 [10 10 30] 71392 27072 8.74874409
AllClose float32 [10 30 10] 70879 24335 7.670844463
AllClose float32 [30 10 10] 70927 24246 8.711292584
AllClose float32 [30 30 30] 65711 25455 8.340915341
AllClose float32 [50 100 50] 67248 33952 4.916558671
AllClose float32 [100 50 100] 81231 36120 4.545681063
AllClose float32 [100 100 100] 118463 51443 3.24114068
AllClose float32 [100 100 300] 278957 111225 2.550478759
AllClose float32 [300 100 100] 279758 111260 2.556588172
AllClose float32 [10 10 10 10] 78799 24726 6.452762275
AllClose float32 [10 10 10 30] 69551 27250 5.766422018
AllClose float32 [30 10 10 10] 70415 24744 6.627141933
AllClose float32 [30 30 30 30] 96527 45595 4.744336002
AllClose float32 [50 100 50 100] 1803536 761815 2.374497746
AllClose float32 [100 50 100 50] 1807936 762918 2.376790166
AllClose float32 [100 100 100 100] 7117329 2999150 2.374897221
AllClose float32 [100 100 300 100] 21184882 8978540 2.360095294
AllClose float32 [300 100 100 100] 21182382 8982030 2.358899937

BFP16
op_name dtype input_size rocm_kernel_avg kernel_duration improvement over rocm
AllClose bfloat16 [50 100] 82896 23075 10.61295775
AllClose bfloat16 [100 50] 82095 21884 7.810592213
AllClose bfloat16 [100 100] 80448 24249 10.28256835
AllClose bfloat16 [100 300] 73840 22364 9.013682704
AllClose bfloat16 [300 100] 73712 24942 6.933766338
AllClose bfloat16 [200 300] 76991 22346 11.63223843
AllClose bfloat16 [205 350] 70320 25564 8.669613519
AllClose bfloat16 [350 105] 82063 21475 7.942956927
AllClose bfloat16 [405 200] 73775 25369 8.384327329
AllClose bfloat16 [10 10 10] 82079 24975 9.395555556
AllClose bfloat16 [10 10 30] 73312 23855 7.619283169
AllClose bfloat16 [10 30 10] 71359 23837 10.23207618
AllClose bfloat16 [30 10 10] 71232 23837 9.934723329
AllClose bfloat16 [30 30 30] 66000 23855 7.161182142
AllClose bfloat16 [50 100 50] 70336 37311 4.760821206
AllClose bfloat16 [100 50 100] 76559 37222 4.416286067
AllClose bfloat16 [100 100 100] 106799 50003 3.402655841
AllClose bfloat16 [100 100 300] 194222 111598 2.002311869
AllClose bfloat16 [300 100 100] 198590 110496 1.840555314
AllClose bfloat16 [10 10 10 10] 82351 21029 7.618336583
AllClose bfloat16 [10 10 10 30] 73392 22113 7.356306245
AllClose bfloat16 [30 10 10 10] 75279 23660 6.247802198
AllClose bfloat16 [30 30 30 30] 82896 45151 5.34769994
AllClose bfloat16 [50 100 50 100] 1091414 769476 1.425268624
AllClose bfloat16 [100 50 100 50] 1083094 771486 1.410791641
AllClose bfloat16 [100 100 100 100] 4123403 3016590 1.368680199
AllClose bfloat16 [100 100 300 100] 12313553 9019080 1.365868913
AllClose bfloat16 [300 100 100 100] 12305455 9021750 1.364565301

hieule88 avatar Feb 26 '25 09:02 hieule88