MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Normalize Backward

Open littlecutebird opened this issue 11 months ago • 0 comments

  • Add NormalizeBackward API wrapped by MIOPEN_BETA_API
  • Add solver and kernel for cases: contiguous tensor, reduce dimension is last dimension, reduce size % 256 = 0 (256 is chosen as block size for kernel), outer_size >= number of CUs. Those cases has performance improvement compared to ROCm pytorch. Please note that bfp16 is not supported because bfp16 calculation in HIP is calculated after convert to fp32, which reduce perf.
  • Added driver and gtest for NormalizeBackward.
type Forward
float32 2.32
float16 2.52
fp32
input_size num_elem p reduce_dim ROCm MIOpen Improvement
11 37 11 11 1280 63036160 2 4 4434010 1283040 3.46
11 37 11 11 1280 63036160 2.5 4 4923864 3440900 1.43
13 2 3221 1024 85755904 2 3 6017754 1752640 3.43
13 2 3221 1024 85755904 2.5 3 6631930 4684560 1.42
131 97 3584 45541888 2 2 3220495 931529 3.46
131 97 3584 45541888 2.5 2 3562568 2492200 1.43
14387 2 3 256 22098432 2 3 1604645 557403 2.88
14387 2 3 256 22098432 2.5 3 1793928 1312380 1.37
172 2 2 2 35584 48963584 2 4 3453336 1001470 3.45
172 2 2 2 35584 48963584 2.5 4 3831136 2683940 1.43
1877 2 23 2 512 88414208 2 4 6173956 1875100 3.29
1877 2 23 2 512 88414208 2.5 4 6831684 4907600 1.39
19 3 17 2 29696 57550848 2 4 4051365 1170830 3.46
19 3 17 2 29696 57550848 2.5 4 4496589 3149270 1.43
19 5 4027 256 97936640 2 3 6998025 2403470 2.91
19 5 4027 256 97936640 2.5 3 7737972 5755220 1.34
2 29 3 2 141056 49087488 2 4 3479493 1022340 3.40
2 29 3 2 141056 49087488 2.5 4 3843096 2702680 1.42
2 311 31 2048 39489536 2 3 2767901 802871 3.45
2 311 31 2048 39489536 2.5 3 3097582 2146900 1.44
2 37957 2 256 38867968 2 3 2779540 955178 2.91
2 37957 2 256 38867968 2.5 3 3088572 2277070 1.36
2 49363 3 256 75821568 2 3 5452614 1849560 2.95
2 49363 3 256 75821568 2.5 3 6012118 4430710 1.36
2 63541 3 256 97598976 2 3 6971395 2374200 2.94
2 63541 3 256 97598976 2.5 3 7712790 5699290 1.35
2 9818 2 512 20107264 2 3 1441360 437402 3.30
2 9818 2 512 20107264 2.5 3 1614396 1117440 1.44
20549 7 2 256 73647616 2 3 5284847 1812170 2.92
20549 7 2 256 73647616 2.5 3 5880898 4340240 1.35
4 3881 21 256 83457024 2 3 5968965 2058190 2.90
4 3881 21 256 83457024 2.5 3 6619226 4902430 1.35
5 11 5 5 54272 74624000 2 4 5238577 1501170 3.49
5 11 5 5 54272 74624000 2.5 4 5789439 4060110 1.43
5 3 211 3840 12153600 2 3 895535 263778 3.40
5 3 211 3840 12153600 2.5 3 1003046 677144 1.48
61613 1280 78864640 2 1 5543941 1599840 3.47
61613 1280 78864640 2.5 1 6118706 4306930 1.42
651 133888 87161088 2 1 6092961 1761470 3.46
651 133888 87161088 2.5 1 6715880 4758320 1.41
7 59 2 100864 83313664 2 3 5852554 1672450 3.50
7 59 2 100864 83313664 2.5 3 6480370 4542550 1.43
78533 256 20104448 2 1 1468958 508019 2.89
78533 256 20104448 2.5 1 1649928 1196430 1.38
fp16
opname dtype input_size num_elem p reduce_dim ROCm MIOpen Improvement
Normalize float16 11 37 11 11 1280 63036160 2 4 3196507 1083980 2.95
Normalize float16 11 37 11 11 1280 63036160 2.5 4 3142656 1245010 2.52
Normalize float16 13 2 3221 1024 85755904 2 3 4319363 1501150 2.88
Normalize float16 13 2 3221 1024 85755904 2.5 3 4227689 1719890 2.46
Normalize float16 131 97 3584 45541888 2 2 2332048 781748 2.98
Normalize float16 131 97 3584 45541888 2.5 2 2285526 898533 2.54
Normalize float16 14387 2 3 256 22098432 2 3 1233432 516762 2.39
Normalize float16 14387 2 3 256 22098432 2.5 3 1230406 572195 2.15
Normalize float16 172 2 2 2 35584 48963584 2 4 2509903 862319 2.91
Normalize float16 172 2 2 2 35584 48963584 2.5 4 2459371 986305 2.49
Normalize float16 1877 2 23 2 512 88414208 2 4 4514497 1691070 2.67
Normalize float16 1877 2 23 2 512 88414208 2.5 4 4431526 1915170 2.31
Normalize float16 19 3 17 2 29696 57550848 2 4 2931277 1013720 2.89
Normalize float16 19 3 17 2 29696 57550848 2.5 4 2882340 1158760 2.49
Normalize float16 19 5 4027 256 97936640 2 3 5168268 2234190 2.31
Normalize float16 19 5 4027 256 97936640 2.5 3 5068704 2483460 2.04
Normalize float16 2 29 3 2 141056 49087488 2 4 2519934 909612 2.77
Normalize float16 2 29 3 2 141056 49087488 2.5 4 2465106 1034510 2.38
Normalize float16 2 311 31 2048 39489536 2 3 2037826 674707 3.02
Normalize float16 2 311 31 2048 39489536 2.5 3 2011590 773217 2.60
Normalize float16 2 37957 2 256 38867968 2 3 2095106 887283 2.36
Normalize float16 2 37957 2 256 38867968 2.5 3 2068248 985811 2.10
Normalize float16 2 49363 3 256 75821568 2 3 4035139 1717550 2.35
Normalize float16 2 49363 3 256 75821568 2.5 3 3960294 1906930 2.08
Normalize float16 2 63541 3 256 97598976 2 3 5156972 2205620 2.34
Normalize float16 2 63541 3 256 97598976 2.5 3 5052478 2449970 2.06
Normalize float16 2 9818 2 512 20107264 2 3 1087337 395676 2.75
Normalize float16 2 9818 2 512 20107264 2.5 3 1076638 446629 2.41
Normalize float16 20549 7 2 256 73647616 2 3 3906774 1686440 2.32
Normalize float16 20549 7 2 256 73647616 2.5 3 3844973 1872940 2.05
Normalize float16 4 3881 21 256 83457024 2 3 4418577 1906380 2.32
Normalize float16 4 3881 21 256 83457024 2.5 3 4324286 2113760 2.05
Normalize float16 5 11 5 5 54272 74624000 2 4 3764358 1305290 2.88
Normalize float16 5 11 5 5 54272 74624000 2.5 4 3685521 1493160 2.47
Normalize float16 5 3 211 3840 12153600 2 3 680443 222692 3.06
Normalize float16 5 3 211 3840 12153600 2.5 3 694004 253591 2.74
Normalize float16 61613 1280 78864640 2 1 3968165 1357260 2.92
Normalize float16 61613 1280 78864640 2.5 1 3892473 1556220 2.50
Normalize float16 651 133888 87161088 2 1 4380067 1553620 2.82
Normalize float16 651 133888 87161088 2.5 1 4282347 1778860 2.41
Normalize float16 7 59 2 100864 83313664 2 3 4206771 1451800 2.90
Normalize float16 7 59 2 100864 83313664 2.5 3 4119342 1660810 2.48
Normalize float16 78533 256 20104448 2 1 1127065 471875 2.39
Normalize float16 78533 256 20104448 2.5 1 1126806 523236 2.15

littlecutebird avatar Feb 12 '25 11:02 littlecutebird