MIOpen
MIOpen copied to clipboard
Implement SparseSoftmaxCrossEntropyWithLogits
-
Added SparseSoftmaxCrossEntropyWithLogits forward and backward.
-
Added driver test and gtest for SparseSoftmaxCrossEntropyWithLogits.
-
New API is guarded by MIOPEN_BETA_API macro.
-
Average over all cases:
-
SparseSoftmaxCrossEntropyWithLogits
| Type | Forward | Backward |
|---|---|---|
| float16 | 1.84 | 3.29 |
| float32 | 1.68 | 3.09 |
| bfloat16 | 1.85 | 3.31 |
FWD - FP16
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | float16 | [128 4] | fwd | 12272 | 5546 | 2.212765957 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [100 20] | fwd | 15280 | 6133 | 2.491439752 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [100 10] | fwd | 16304 | 5742 | 2.83942877 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [25 1000] | fwd | 22944 | 13920 | 1.648275862 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [1000 100] | fwd | 18960 | 8373 | 2.264421354 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [237 80] | fwd | 16880 | 7058 | 2.391612355 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [255 80] | fwd | 16464 | 7360 | 2.236956522 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [258 80] | fwd | 16176 | 7342 | 2.203214383 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [261 80] | fwd | 15728 | 7324 | 2.147460404 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [267 80] | fwd | 16144 | 7147 | 2.258849867 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [768 80] | fwd | 17872 | 7520 | 2.376595745 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [800 237] | fwd | 18480 | 9262 | 1.995249406 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [800 585] | fwd | 23744 | 13991 | 1.697090987 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [800 645] | fwd | 24912 | 14524 | 1.715229964 |
FWD - FP32
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | float32 | [147 4] | fwd | 12432 | 5724 | 2.171907757 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [20 30] | fwd | 14576 | 7751 | 1.880531544 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [5 10] | fwd | 13200 | 10027 | 1.316445597 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [2 5] | fwd | 12128 | 10471 | 1.158246586 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 300] | fwd | 16560 | 9387 | 1.764141898 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 100] | fwd | 16176 | 7680 | 2.10625 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [100 20] | fwd | 13296 | 7005 | 1.898072805 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [100 10] | fwd | 13520 | 6080 | 2.223684211 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 1000] | fwd | 19360 | 14347 | 1.349411027 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [1000 100] | fwd | 16832 | 8462 | 1.989127866 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [237 80] | fwd | 14128 | 7129 | 1.981764623 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [663 80] | fwd | 14928 | 7662 | 1.948316366 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [800 237] | fwd | 16672 | 9333 | 1.786349512 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [800 285] | fwd | 16880 | 10755 | 1.569502557 |
FWD - BFP16
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [20 30] | fwd | 16192 | 7947 | 2.037498427 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [5 10] | fwd | 15808 | 10311 | 1.533119969 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [2 5] | fwd | 14448 | 10773 | 1.341130604 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 300] | fwd | 20048 | 9458 | 2.119687037 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 100] | fwd | 18976 | 7129 | 2.6618039 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [100 20] | fwd | 15456 | 6169 | 2.505430378 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [100 10] | fwd | 16176 | 5707 | 2.834413878 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 1000] | fwd | 26976 | 14454 | 1.866334579 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [1000 100] | fwd | 19936 | 8391 | 2.375878918 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [237 80] | fwd | 17584 | 7093 | 2.479063866 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [255 80] | fwd | 16992 | 7342 | 2.314355761 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [800 372] | fwd | 20496 | 10969 | 1.868538609 |
BWD - FP16
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | float16 | [20 30] | bwd | 16496 | 3537 | 4.663839412 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [5 10] | bwd | 14848 | 3182 | 4.666247643 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [2 5] | bwd | 14320 | 3431 | 4.173710289 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [25 300] | bwd | 18608 | 5653 | 3.29170352 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [25 100] | bwd | 16912 | 4409 | 3.835790429 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [100 20] | bwd | 14000 | 4249 | 3.294892916 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [100 10] | bwd | 15184 | 3946 | 3.847947288 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [2000 3000] | bwd | 71695 | 32962 | 2.175080396 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [25 1000] | bwd | 27264 | 6364 | 4.284098052 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [1000 100] | bwd | 18528 | 5778 | 3.206645898 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [237 80] | bwd | 16512 | 5280 | 3.127272727 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [489 80] | bwd | 16464 | 5049 | 3.260843731 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [693 80] | bwd | 15968 | 4924 | 3.242891958 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [744 80] | bwd | 16192 | 4978 | 3.252711933 |
| SparseSoftmaxCrossEntropyWithLogits | float16 | [800 261] | bwd | 18704 | 5458 | 3.426896299 |
BWD - FP32
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | float32 | [20 30] | bwd | 14192 | 3982 | 3.564038172 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [5 10] | bwd | 13088 | 3058 | 4.279921517 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [2 5] | bwd | 13584 | 3324 | 4.086642599 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 300] | bwd | 16656 | 5600 | 2.974285714 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 100] | bwd | 15840 | 5227 | 3.030418978 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [100 20] | bwd | 14352 | 4995 | 2.873273273 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [100 10] | bwd | 14432 | 3929 | 3.673199287 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [2000 3000] | bwd | 106559 | 54900 | 1.940965392 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [25 1000] | bwd | 21808 | 6329 | 3.445726023 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [1000 100] | bwd | 16144 | 5315 | 3.037441204 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [363 80] | bwd | 13888 | 4782 | 2.904224174 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [534 80] | bwd | 14080 | 4711 | 2.988749735 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [744 80] | bwd | 14288 | 4728 | 3.021996616 |
| SparseSoftmaxCrossEntropyWithLogits | float32 | [800 255] | bwd | 16768 | 5333 | 3.144196512 |
FWD - BFP16
| op_name | dtype | size | direction | Rocm kernel avg | MIOpen kernel avg | ROCm / MIOpen |
|---|---|---|---|---|---|---|
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [20 30] | bwd | 16672 | 3662 | 4.552703441 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [5 10] | bwd | 14704 | 3200 | 4.595 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [2 5] | bwd | 14672 | 3982 | 3.684580613 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 300] | bwd | 19552 | 5475 | 3.571141553 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 100] | bwd | 17472 | 4640 | 3.765517241 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [100 20] | bwd | 14752 | 4231 | 3.486646183 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [100 10] | bwd | 15488 | 3644 | 4.250274424 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [2000 3000] | bwd | 72624 | 33566 | 2.163617947 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [25 1000] | bwd | 27936 | 6364 | 4.389692018 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [1000 100] | bwd | 18944 | 5795 | 3.269025022 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [237 80] | bwd | 16400 | 5263 | 3.116093483 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [489 80] | bwd | 16128 | 4995 | 3.228828829 |
| SparseSoftmaxCrossEntropyWithLogits | bfloat16 | [800 324] | bwd | 19184 | 5991 | 3.202136538 |