MIOpen
MIOpen copied to clipboard
Implement Normalize Backward
- Add NormalizeBackward API wrapped by MIOPEN_BETA_API
- Add solver and kernel for cases: contiguous tensor, reduce dimension is last dimension, reduce size % 256 = 0 (256 is chosen as block size for kernel), outer_size >= number of CUs. Those cases has performance improvement compared to ROCm pytorch. Please note that bfp16 is not supported because bfp16 calculation in HIP is calculated after convert to fp32, which reduce perf.
- Added driver and gtest for NormalizeBackward.
| type | Forward |
|---|---|
| float32 | 2.32 |
| float16 | 2.52 |
fp32
| input_size | num_elem | p | reduce_dim | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|
| 11 37 11 11 1280 | 63036160 | 2 | 4 | 4434010 | 1283040 | 3.46 |
| 11 37 11 11 1280 | 63036160 | 2.5 | 4 | 4923864 | 3440900 | 1.43 |
| 13 2 3221 1024 | 85755904 | 2 | 3 | 6017754 | 1752640 | 3.43 |
| 13 2 3221 1024 | 85755904 | 2.5 | 3 | 6631930 | 4684560 | 1.42 |
| 131 97 3584 | 45541888 | 2 | 2 | 3220495 | 931529 | 3.46 |
| 131 97 3584 | 45541888 | 2.5 | 2 | 3562568 | 2492200 | 1.43 |
| 14387 2 3 256 | 22098432 | 2 | 3 | 1604645 | 557403 | 2.88 |
| 14387 2 3 256 | 22098432 | 2.5 | 3 | 1793928 | 1312380 | 1.37 |
| 172 2 2 2 35584 | 48963584 | 2 | 4 | 3453336 | 1001470 | 3.45 |
| 172 2 2 2 35584 | 48963584 | 2.5 | 4 | 3831136 | 2683940 | 1.43 |
| 1877 2 23 2 512 | 88414208 | 2 | 4 | 6173956 | 1875100 | 3.29 |
| 1877 2 23 2 512 | 88414208 | 2.5 | 4 | 6831684 | 4907600 | 1.39 |
| 19 3 17 2 29696 | 57550848 | 2 | 4 | 4051365 | 1170830 | 3.46 |
| 19 3 17 2 29696 | 57550848 | 2.5 | 4 | 4496589 | 3149270 | 1.43 |
| 19 5 4027 256 | 97936640 | 2 | 3 | 6998025 | 2403470 | 2.91 |
| 19 5 4027 256 | 97936640 | 2.5 | 3 | 7737972 | 5755220 | 1.34 |
| 2 29 3 2 141056 | 49087488 | 2 | 4 | 3479493 | 1022340 | 3.40 |
| 2 29 3 2 141056 | 49087488 | 2.5 | 4 | 3843096 | 2702680 | 1.42 |
| 2 311 31 2048 | 39489536 | 2 | 3 | 2767901 | 802871 | 3.45 |
| 2 311 31 2048 | 39489536 | 2.5 | 3 | 3097582 | 2146900 | 1.44 |
| 2 37957 2 256 | 38867968 | 2 | 3 | 2779540 | 955178 | 2.91 |
| 2 37957 2 256 | 38867968 | 2.5 | 3 | 3088572 | 2277070 | 1.36 |
| 2 49363 3 256 | 75821568 | 2 | 3 | 5452614 | 1849560 | 2.95 |
| 2 49363 3 256 | 75821568 | 2.5 | 3 | 6012118 | 4430710 | 1.36 |
| 2 63541 3 256 | 97598976 | 2 | 3 | 6971395 | 2374200 | 2.94 |
| 2 63541 3 256 | 97598976 | 2.5 | 3 | 7712790 | 5699290 | 1.35 |
| 2 9818 2 512 | 20107264 | 2 | 3 | 1441360 | 437402 | 3.30 |
| 2 9818 2 512 | 20107264 | 2.5 | 3 | 1614396 | 1117440 | 1.44 |
| 20549 7 2 256 | 73647616 | 2 | 3 | 5284847 | 1812170 | 2.92 |
| 20549 7 2 256 | 73647616 | 2.5 | 3 | 5880898 | 4340240 | 1.35 |
| 4 3881 21 256 | 83457024 | 2 | 3 | 5968965 | 2058190 | 2.90 |
| 4 3881 21 256 | 83457024 | 2.5 | 3 | 6619226 | 4902430 | 1.35 |
| 5 11 5 5 54272 | 74624000 | 2 | 4 | 5238577 | 1501170 | 3.49 |
| 5 11 5 5 54272 | 74624000 | 2.5 | 4 | 5789439 | 4060110 | 1.43 |
| 5 3 211 3840 | 12153600 | 2 | 3 | 895535 | 263778 | 3.40 |
| 5 3 211 3840 | 12153600 | 2.5 | 3 | 1003046 | 677144 | 1.48 |
| 61613 1280 | 78864640 | 2 | 1 | 5543941 | 1599840 | 3.47 |
| 61613 1280 | 78864640 | 2.5 | 1 | 6118706 | 4306930 | 1.42 |
| 651 133888 | 87161088 | 2 | 1 | 6092961 | 1761470 | 3.46 |
| 651 133888 | 87161088 | 2.5 | 1 | 6715880 | 4758320 | 1.41 |
| 7 59 2 100864 | 83313664 | 2 | 3 | 5852554 | 1672450 | 3.50 |
| 7 59 2 100864 | 83313664 | 2.5 | 3 | 6480370 | 4542550 | 1.43 |
| 78533 256 | 20104448 | 2 | 1 | 1468958 | 508019 | 2.89 |
| 78533 256 | 20104448 | 2.5 | 1 | 1649928 | 1196430 | 1.38 |
fp16
| opname | dtype | input_size | num_elem | p | reduce_dim | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|---|---|
| Normalize | float16 | 11 37 11 11 1280 | 63036160 | 2 | 4 | 3196507 | 1083980 | 2.95 |
| Normalize | float16 | 11 37 11 11 1280 | 63036160 | 2.5 | 4 | 3142656 | 1245010 | 2.52 |
| Normalize | float16 | 13 2 3221 1024 | 85755904 | 2 | 3 | 4319363 | 1501150 | 2.88 |
| Normalize | float16 | 13 2 3221 1024 | 85755904 | 2.5 | 3 | 4227689 | 1719890 | 2.46 |
| Normalize | float16 | 131 97 3584 | 45541888 | 2 | 2 | 2332048 | 781748 | 2.98 |
| Normalize | float16 | 131 97 3584 | 45541888 | 2.5 | 2 | 2285526 | 898533 | 2.54 |
| Normalize | float16 | 14387 2 3 256 | 22098432 | 2 | 3 | 1233432 | 516762 | 2.39 |
| Normalize | float16 | 14387 2 3 256 | 22098432 | 2.5 | 3 | 1230406 | 572195 | 2.15 |
| Normalize | float16 | 172 2 2 2 35584 | 48963584 | 2 | 4 | 2509903 | 862319 | 2.91 |
| Normalize | float16 | 172 2 2 2 35584 | 48963584 | 2.5 | 4 | 2459371 | 986305 | 2.49 |
| Normalize | float16 | 1877 2 23 2 512 | 88414208 | 2 | 4 | 4514497 | 1691070 | 2.67 |
| Normalize | float16 | 1877 2 23 2 512 | 88414208 | 2.5 | 4 | 4431526 | 1915170 | 2.31 |
| Normalize | float16 | 19 3 17 2 29696 | 57550848 | 2 | 4 | 2931277 | 1013720 | 2.89 |
| Normalize | float16 | 19 3 17 2 29696 | 57550848 | 2.5 | 4 | 2882340 | 1158760 | 2.49 |
| Normalize | float16 | 19 5 4027 256 | 97936640 | 2 | 3 | 5168268 | 2234190 | 2.31 |
| Normalize | float16 | 19 5 4027 256 | 97936640 | 2.5 | 3 | 5068704 | 2483460 | 2.04 |
| Normalize | float16 | 2 29 3 2 141056 | 49087488 | 2 | 4 | 2519934 | 909612 | 2.77 |
| Normalize | float16 | 2 29 3 2 141056 | 49087488 | 2.5 | 4 | 2465106 | 1034510 | 2.38 |
| Normalize | float16 | 2 311 31 2048 | 39489536 | 2 | 3 | 2037826 | 674707 | 3.02 |
| Normalize | float16 | 2 311 31 2048 | 39489536 | 2.5 | 3 | 2011590 | 773217 | 2.60 |
| Normalize | float16 | 2 37957 2 256 | 38867968 | 2 | 3 | 2095106 | 887283 | 2.36 |
| Normalize | float16 | 2 37957 2 256 | 38867968 | 2.5 | 3 | 2068248 | 985811 | 2.10 |
| Normalize | float16 | 2 49363 3 256 | 75821568 | 2 | 3 | 4035139 | 1717550 | 2.35 |
| Normalize | float16 | 2 49363 3 256 | 75821568 | 2.5 | 3 | 3960294 | 1906930 | 2.08 |
| Normalize | float16 | 2 63541 3 256 | 97598976 | 2 | 3 | 5156972 | 2205620 | 2.34 |
| Normalize | float16 | 2 63541 3 256 | 97598976 | 2.5 | 3 | 5052478 | 2449970 | 2.06 |
| Normalize | float16 | 2 9818 2 512 | 20107264 | 2 | 3 | 1087337 | 395676 | 2.75 |
| Normalize | float16 | 2 9818 2 512 | 20107264 | 2.5 | 3 | 1076638 | 446629 | 2.41 |
| Normalize | float16 | 20549 7 2 256 | 73647616 | 2 | 3 | 3906774 | 1686440 | 2.32 |
| Normalize | float16 | 20549 7 2 256 | 73647616 | 2.5 | 3 | 3844973 | 1872940 | 2.05 |
| Normalize | float16 | 4 3881 21 256 | 83457024 | 2 | 3 | 4418577 | 1906380 | 2.32 |
| Normalize | float16 | 4 3881 21 256 | 83457024 | 2.5 | 3 | 4324286 | 2113760 | 2.05 |
| Normalize | float16 | 5 11 5 5 54272 | 74624000 | 2 | 4 | 3764358 | 1305290 | 2.88 |
| Normalize | float16 | 5 11 5 5 54272 | 74624000 | 2.5 | 4 | 3685521 | 1493160 | 2.47 |
| Normalize | float16 | 5 3 211 3840 | 12153600 | 2 | 3 | 680443 | 222692 | 3.06 |
| Normalize | float16 | 5 3 211 3840 | 12153600 | 2.5 | 3 | 694004 | 253591 | 2.74 |
| Normalize | float16 | 61613 1280 | 78864640 | 2 | 1 | 3968165 | 1357260 | 2.92 |
| Normalize | float16 | 61613 1280 | 78864640 | 2.5 | 1 | 3892473 | 1556220 | 2.50 |
| Normalize | float16 | 651 133888 | 87161088 | 2 | 1 | 4380067 | 1553620 | 2.82 |
| Normalize | float16 | 651 133888 | 87161088 | 2.5 | 1 | 4282347 | 1778860 | 2.41 |
| Normalize | float16 | 7 59 2 100864 | 83313664 | 2 | 3 | 4206771 | 1451800 | 2.90 |
| Normalize | float16 | 7 59 2 100864 | 83313664 | 2.5 | 3 | 4119342 | 1660810 | 2.48 |
| Normalize | float16 | 78533 256 | 20104448 | 2 | 1 | 1127065 | 471875 | 2.39 |
| Normalize | float16 | 78533 256 | 20104448 | 2.5 | 1 | 1126806 | 523236 | 2.15 |