MIOpen
MIOpen copied to clipboard
Implement Allclose
-
Added Allclose.
-
Added driver test and gtest for Allclose.
-
New API is guarded by MIOPEN_BETA_API macro.
-
Average over all cases:
-
Allclose
| Type | Allclose |
|---|---|
| float16 | 6.04 |
| float32 | 6.06 |
| bfloat16 | 6.27 |
FP16
| op_name | dtype | input_size | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|
| AllClose | float16 | [50 100] | 85599 | 24106 | 10.96946818 |
| AllClose | float16 | [100 50] | 80351 | 22186 | 8.721085369 |
| AllClose | float16 | [100 100] | 79855 | 22791 | 10.78873239 |
| AllClose | float16 | [100 300] | 72655 | 22097 | 8.67873467 |
| AllClose | float16 | [300 100] | 71951 | 22969 | 8.561713614 |
| AllClose | float16 | [200 300] | 76288 | 23591 | 10.21737103 |
| AllClose | float16 | [205 350] | 70287 | 30382 | 6.970377197 |
| AllClose | float16 | [350 105] | 80367 | 23644 | 9.371595331 |
| AllClose | float16 | [405 200] | 72704 | 29671 | 6.68282161 |
| AllClose | float16 | [10 10 10] | 82863 | 24746 | 5.899903015 |
| AllClose | float16 | [10 10 30] | 73071 | 26077 | 6.860221651 |
| AllClose | float16 | [10 30 10] | 71087 | 24157 | 9.952146376 |
| AllClose | float16 | [30 10 10] | 71264 | 25437 | 8.543067186 |
| AllClose | float16 | [30 30 30] | 64303 | 25064 | 7.691589531 |
| AllClose | float16 | [50 100 50] | 69951 | 32085 | 6.378993299 |
| AllClose | float16 | [100 50 100] | 74735 | 35623 | 4.566010723 |
| AllClose | float16 | [100 100 100] | 104239 | 49897 | 3.38583482 |
| AllClose | float16 | [100 100 300] | 192270 | 110513 | 2.083791047 |
| AllClose | float16 | [300 100 100] | 189790 | 110193 | 1.765756446 |
| AllClose | float16 | [10 10 10 10] | 82447 | 23713 | 6.867372327 |
| AllClose | float16 | [10 10 10 30] | 73967 | 24157 | 6.524609844 |
| AllClose | float16 | [30 10 10 10] | 73391 | 26913 | 5.638873407 |
| AllClose | float16 | [30 30 30 30] | 80240 | 44120 | 5.412103354 |
| AllClose | float16 | [50 100 50 100] | 1074263 | 768978 | 1.403990751 |
| AllClose | float16 | [100 50 100 50] | 1080566 | 770099 | 1.410049877 |
| AllClose | float16 | [100 100 100 100] | 4089980 | 3016030 | 1.357847236 |
| AllClose | float16 | [100 100 300 100] | 12305426 | 12653600 | 0.972901467 |
| AllClose | float16 | [300 100 100 100] | 12280144 | 9020920 | 1.361886814 |
FP32
| op_name | dtype | input_size | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|
| AllClose | float32 | [50 100] | 80160 | 23004 | 9.862545644 |
| AllClose | float32 | [100 50] | 75471 | 23395 | 10.5820047 |
| AllClose | float32 | [100 100] | 75887 | 23164 | 7.548868935 |
| AllClose | float32 | [100 300] | 68943 | 23182 | 9.169096713 |
| AllClose | float32 | [300 100] | 69183 | 23946 | 8.736239873 |
| AllClose | float32 | [200 300] | 74944 | 23484 | 7.446005791 |
| AllClose | float32 | [205 350] | 68959 | 29351 | 7.368948247 |
| AllClose | float32 | [350 105] | 77599 | 24249 | 8.441007877 |
| AllClose | float32 | [405 200] | 70319 | 30400 | 5.480460526 |
| AllClose | float32 | [10 10 10] | 75583 | 24088 | 8.289521754 |
| AllClose | float32 | [10 10 30] | 71392 | 27072 | 8.74874409 |
| AllClose | float32 | [10 30 10] | 70879 | 24335 | 7.670844463 |
| AllClose | float32 | [30 10 10] | 70927 | 24246 | 8.711292584 |
| AllClose | float32 | [30 30 30] | 65711 | 25455 | 8.340915341 |
| AllClose | float32 | [50 100 50] | 67248 | 33952 | 4.916558671 |
| AllClose | float32 | [100 50 100] | 81231 | 36120 | 4.545681063 |
| AllClose | float32 | [100 100 100] | 118463 | 51443 | 3.24114068 |
| AllClose | float32 | [100 100 300] | 278957 | 111225 | 2.550478759 |
| AllClose | float32 | [300 100 100] | 279758 | 111260 | 2.556588172 |
| AllClose | float32 | [10 10 10 10] | 78799 | 24726 | 6.452762275 |
| AllClose | float32 | [10 10 10 30] | 69551 | 27250 | 5.766422018 |
| AllClose | float32 | [30 10 10 10] | 70415 | 24744 | 6.627141933 |
| AllClose | float32 | [30 30 30 30] | 96527 | 45595 | 4.744336002 |
| AllClose | float32 | [50 100 50 100] | 1803536 | 761815 | 2.374497746 |
| AllClose | float32 | [100 50 100 50] | 1807936 | 762918 | 2.376790166 |
| AllClose | float32 | [100 100 100 100] | 7117329 | 2999150 | 2.374897221 |
| AllClose | float32 | [100 100 300 100] | 21184882 | 8978540 | 2.360095294 |
| AllClose | float32 | [300 100 100 100] | 21182382 | 8982030 | 2.358899937 |
BFP16
| op_name | dtype | input_size | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|
| AllClose | bfloat16 | [50 100] | 82896 | 23075 | 10.61295775 |
| AllClose | bfloat16 | [100 50] | 82095 | 21884 | 7.810592213 |
| AllClose | bfloat16 | [100 100] | 80448 | 24249 | 10.28256835 |
| AllClose | bfloat16 | [100 300] | 73840 | 22364 | 9.013682704 |
| AllClose | bfloat16 | [300 100] | 73712 | 24942 | 6.933766338 |
| AllClose | bfloat16 | [200 300] | 76991 | 22346 | 11.63223843 |
| AllClose | bfloat16 | [205 350] | 70320 | 25564 | 8.669613519 |
| AllClose | bfloat16 | [350 105] | 82063 | 21475 | 7.942956927 |
| AllClose | bfloat16 | [405 200] | 73775 | 25369 | 8.384327329 |
| AllClose | bfloat16 | [10 10 10] | 82079 | 24975 | 9.395555556 |
| AllClose | bfloat16 | [10 10 30] | 73312 | 23855 | 7.619283169 |
| AllClose | bfloat16 | [10 30 10] | 71359 | 23837 | 10.23207618 |
| AllClose | bfloat16 | [30 10 10] | 71232 | 23837 | 9.934723329 |
| AllClose | bfloat16 | [30 30 30] | 66000 | 23855 | 7.161182142 |
| AllClose | bfloat16 | [50 100 50] | 70336 | 37311 | 4.760821206 |
| AllClose | bfloat16 | [100 50 100] | 76559 | 37222 | 4.416286067 |
| AllClose | bfloat16 | [100 100 100] | 106799 | 50003 | 3.402655841 |
| AllClose | bfloat16 | [100 100 300] | 194222 | 111598 | 2.002311869 |
| AllClose | bfloat16 | [300 100 100] | 198590 | 110496 | 1.840555314 |
| AllClose | bfloat16 | [10 10 10 10] | 82351 | 21029 | 7.618336583 |
| AllClose | bfloat16 | [10 10 10 30] | 73392 | 22113 | 7.356306245 |
| AllClose | bfloat16 | [30 10 10 10] | 75279 | 23660 | 6.247802198 |
| AllClose | bfloat16 | [30 30 30 30] | 82896 | 45151 | 5.34769994 |
| AllClose | bfloat16 | [50 100 50 100] | 1091414 | 769476 | 1.425268624 |
| AllClose | bfloat16 | [100 50 100 50] | 1083094 | 771486 | 1.410791641 |
| AllClose | bfloat16 | [100 100 100 100] | 4123403 | 3016590 | 1.368680199 |
| AllClose | bfloat16 | [100 100 300 100] | 12313553 | 9019080 | 1.365868913 |
| AllClose | bfloat16 | [300 100 100 100] | 12305455 | 9021750 | 1.364565301 |