MIOpen
MIOpen copied to clipboard
Implement SoftmaxCrossEntropyWithLogits
- Added contiguous SoftmaxCrossEntropyWithLogits forward and backward contiguous operation and kernel.
- Added driver test and gtest for SoftmaxCrossEntropyWithLogits .
- New API is guarded by MIOPEN_BETA_API macro.
- Average over all cases:
| type | Forward | Backward |
|---|---|---|
| float16 | 3.01 | 4.27 |
| float32 | 2.72 | 2.99 |
| bfloat16 | 2.99 | 4.43 |
FP16
| op_name | dtype | size | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|
| SoftmaxCrossEntropyWithLogits | float16 | [20 30] | fwd | 34080 | 8071 | 4.22252509 |
| SoftmaxCrossEntropyWithLogits | float16 | [20 30] | bwd | 57758 | 7129 | 8.101837565 |
| SoftmaxCrossEntropyWithLogits | float16 | [5 10] | fwd | 37919 | 9813 | 3.864159788 |
| SoftmaxCrossEntropyWithLogits | float16 | [5 10] | bwd | 37119 | 8338 | 4.451786999 |
| SoftmaxCrossEntropyWithLogits | float16 | [2 5] | fwd | 31679 | 10293 | 3.077722724 |
| SoftmaxCrossEntropyWithLogits | float16 | [2 5] | bwd | 33118 | 8924 | 3.711116091 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 300] | fwd | 32159 | 11324 | 2.839897563 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 300] | bwd | 59038 | 10080 | 5.856944444 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 100] | fwd | 34719 | 8107 | 4.282595288 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 100] | bwd | 46079 | 7662 | 6.013965022 |
| SoftmaxCrossEntropyWithLogits | float16 | [100 20] | fwd | 27519 | 6809 | 4.041562638 |
| SoftmaxCrossEntropyWithLogits | float16 | [100 20] | bwd | 37278 | 7129 | 5.229064385 |
| SoftmaxCrossEntropyWithLogits | float16 | [100 10] | fwd | 27840 | 6169 | 4.512887016 |
| SoftmaxCrossEntropyWithLogits | float16 | [100 10] | bwd | 55839 | 6347 | 8.797699701 |
| SoftmaxCrossEntropyWithLogits | float16 | [2000 3000] | fwd | 163515 | 133065 | 1.228835532 |
| SoftmaxCrossEntropyWithLogits | float16 | [2000 3000] | bwd | 225273 | 120549 | 1.86872558 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 1000] | fwd | 32799 | 17599 | 1.863685437 |
| SoftmaxCrossEntropyWithLogits | float16 | [25 1000] | bwd | 49598 | 16960 | 2.924410377 |
| SoftmaxCrossEntropyWithLogits | float16 | [1000 100] | fwd | 37279 | 9635 | 3.869122989 |
| SoftmaxCrossEntropyWithLogits | float16 | [1000 100] | bwd | 44959 | 8123 | 5.534777791 |
FP32
| op_name | dtype | size | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|
| SoftmaxCrossEntropyWithLogits | float32 | [20 30] | fwd | 29439 | 8267 | 3.561025765 |
| SoftmaxCrossEntropyWithLogits | float32 | [20 30] | bwd | 29440 | 7164 | 4.109436069 |
| SoftmaxCrossEntropyWithLogits | float32 | [5 10] | fwd | 33760 | 9867 | 3.42150603 |
| SoftmaxCrossEntropyWithLogits | float32 | [5 10] | bwd | 20319 | 8249 | 2.463207662 |
| SoftmaxCrossEntropyWithLogits | float32 | [2 5] | fwd | 24960 | 9706 | 2.571605193 |
| SoftmaxCrossEntropyWithLogits | float32 | [2 5] | bwd | 20160 | 8409 | 2.397431324 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 300] | fwd | 25440 | 11164 | 2.278753135 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 300] | bwd | 28799 | 10933 | 2.634135187 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 100] | fwd | 31839 | 8462 | 3.762585677 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 100] | bwd | 25280 | 8071 | 3.13220171 |
| SoftmaxCrossEntropyWithLogits | float32 | [100 20] | fwd | 27998 | 7413 | 3.776878457 |
| SoftmaxCrossEntropyWithLogits | float32 | [100 20] | bwd | 24958 | 7324 | 3.40770071 |
| SoftmaxCrossEntropyWithLogits | float32 | [100 10] | fwd | 27199 | 6862 | 3.963713203 |
| SoftmaxCrossEntropyWithLogits | float32 | [100 10] | bwd | 25599 | 6898 | 3.711075674 |
| SoftmaxCrossEntropyWithLogits | float32 | [2000 3000] | fwd | 169915 | 200779 | 0.846278744 |
| SoftmaxCrossEntropyWithLogits | float32 | [2000 3000] | bwd | 171355 | 187411 | 0.914327334 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 1000] | fwd | 32799 | 18115 | 1.810598951 |
| SoftmaxCrossEntropyWithLogits | float32 | [25 1000] | bwd | 33919 | 17173 | 1.975135387 |
| SoftmaxCrossEntropyWithLogits | float32 | [1000 100] | fwd | 33120 | 9884 | 3.350870093 |
| SoftmaxCrossEntropyWithLogits | float32 | [1000 100] | bwd | 34079 | 8498 | 4.010237703 |
BFP16
| op_name | dtype | size | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|
| SoftmaxCrossEntropyWithLogits | bfloat16 | [20 30] | fwd | 36313 | 7822 | 4.642418819 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [20 30] | bwd | 63667 | 7093 | 8.976032708 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [5 10] | fwd | 38552 | 10507 | 3.669172932 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [5 10] | bwd | 39352 | 8746 | 4.49942831 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [2 5] | fwd | 40470 | 11004 | 3.677753544 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [2 5] | bwd | 39671 | 8622 | 4.601136627 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 300] | fwd | 32154 | 11111 | 2.893888939 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 300] | bwd | 64466 | 10346 | 6.231007153 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 100] | fwd | 35672 | 8444 | 4.224538134 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 100] | bwd | 41272 | 8320 | 4.960576923 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [100 20] | fwd | 31033 | 7324 | 4.237165483 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [100 20] | bwd | 43192 | 6933 | 6.2299149 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [100 10] | fwd | 34393 | 6222 | 5.527643844 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [100 10] | bwd | 44629 | 5955 | 7.494374475 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [2000 3000] | fwd | 160445 | 134291 | 1.194756164 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [2000 3000] | bwd | 232272 | 121900 | 1.905430681 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 1000] | fwd | 35992 | 18009 | 1.998556277 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [25 1000] | bwd | 66546 | 17120 | 3.88703271 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [1000 100] | fwd | 38072 | 9831 | 3.872647747 |
| SoftmaxCrossEntropyWithLogits | bfloat16 | [1000 100] | bwd | 54389 | 8427 | 6.454135517 |