MIOpen
MIOpen copied to clipboard
Implement Var
- Added backward Var operation and kernel.
- Added driver test and gtest for Var.
When comparing the newly developed miopen var kernel with ROCm, there's performance improvement for a specific range of input sizes.(1024 ~ 1024 * 1024 * 2)
| Type | Direction | geomean |
|---|---|---|
| fp32 | bwd | 3.61 |
| fp16 | bwd | 3.9 |
| bfp16 | bwd | 4.28 |
float32
| op_name | dtype | size | dim | contiguous | direction | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|---|---|
| Var | float32 | [2048 1 1] | [0] | contiguous | bwd | 29150 | 9689 | 3.01 |
| Var | float32 | [2048 1 1] | [0] | noncontiguous | bwd | 29150 | 9227 | 3.16 |
| Var | float32 | [8192 1 1] | [0] | contiguous | bwd | 27918 | 9227 | 3.03 |
| Var | float32 | [8192 1 1] | [0] | noncontiguous | bwd | 29790 | 8391 | 3.55 |
| Var | float32 | [65536 1 1] | [0] | contiguous | bwd | 38894 | 9476 | 4.10 |
| Var | float32 | [65536 1 1] | [0] | noncontiguous | bwd | 41230 | 9102 | 4.53 |
| Var | float32 | [80 40 1] | [0] | contiguous | bwd | 42462 | 9209 | 4.61 |
| Var | float32 | [80 40 1] | [0] | noncontiguous | bwd | 41486 | 12604 | 3.29 |
| Var | float32 | [80 250 1] | [0] | contiguous | bwd | 35838 | 8675 | 4.13 |
| Var | float32 | [80 250 1] | [0] | noncontiguous | bwd | 35998 | 12427 | 2.90 |
| Var | float32 | [40 50 30] | [0] | contiguous | bwd | 35406 | 9173 | 3.86 |
| Var | float32 | [40 50 30] | [0] | noncontiguous | bwd | 39469 | 15147 | 2.61 |
| Var | float32 | [30 50 10] | [0] | contiguous | bwd | 42814 | 8373 | 5.11 |
| Var | float32 | [30 50 10] | [0] | noncontiguous | bwd | 37678 | 13173 | 2.86 |
| Var | float32 | [50 40 50] | [0] | contiguous | bwd | 36910 | 9422 | 3.92 |
| Var | float32 | [50 40 50] | [0] | noncontiguous | bwd | 43630 | 16658 | 2.62 |
| Var | float32 | [40 60 10] | [0] | contiguous | bwd | 37374 | 8018 | 4.66 |
| Var | float32 | [40 60 10] | [0] | noncontiguous | bwd | 37134 | 12604 | 2.95 |
| Var | float32 | [30 50 20] | [1] | contiguous | bwd | 38462 | 8480 | 4.54 |
| Var | float32 | [30 50 20] | [1] | noncontiguous | bwd | 39870 | 14169 | 2.81 |
float16
| op_name | dtype | size | dim | contiguous | direction | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|---|---|
| Var | float16 | [2048 1 1] | [0] | contiguous | bwd | 36558 | 9689 | 3.77 |
| Var | float16 | [2048 1 1] | [0] | noncontiguous | bwd | 33518 | 9440 | 3.55 |
| Var | float16 | [8192 1 1] | [0] | contiguous | bwd | 31230 | 9653 | 3.24 |
| Var | float16 | [8192 1 1] | [0] | noncontiguous | bwd | 33070 | 9760 | 3.39 |
| Var | float16 | [65536 1 1] | [0] | contiguous | bwd | 37726 | 9671 | 3.90 |
| Var | float16 | [65536 1 1] | [0] | noncontiguous | bwd | 42126 | 9475 | 4.45 |
| Var | float16 | [80 40 1] | [0] | contiguous | bwd | 46318 | 9938 | 4.66 |
| Var | float16 | [80 40 1] | [0] | noncontiguous | bwd | 52381 | 12533 | 4.18 |
| Var | float16 | [80 250 1] | [0] | contiguous | bwd | 41342 | 8853 | 4.67 |
| Var | float16 | [80 250 1] | [0] | noncontiguous | bwd | 43662 | 13013 | 3.36 |
| Var | float16 | [40 50 30] | [0] | contiguous | bwd | 40590 | 8409 | 4.83 |
| Var | float16 | [40 50 30] | [0] | noncontiguous | bwd | 45086 | 15982 | 2.82 |
| Var | float16 | [30 50 10] | [0] | contiguous | bwd | 40398 | 9137 | 4.42 |
| Var | float16 | [30 50 10] | [0] | noncontiguous | bwd | 45101 | 14187 | 3.18 |
| Var | float16 | [50 40 50] | [0] | contiguous | bwd | 43678 | 8960 | 4.87 |
| Var | float16 | [50 40 50] | [0] | noncontiguous | bwd | 49054 | 17778 | 2.76 |
| Var | float16 | [40 60 10] | [0] | contiguous | bwd | 40318 | 9333 | 4.32 |
| Var | float16 | [40 60 10] | [0] | noncontiguous | bwd | 45102 | 13511 | 3.34 |
| Var | float16 | [30 50 20] | [1] | contiguous | bwd | 43662 | 8551 | 5.11 |
| Var | float16 | [30 50 20] | [1] | noncontiguous | bwd | 45870 | 13795 | 3.33 |
bfloat16
| op_name | dtype | size | dim | contiguous | direction | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|---|---|
| Var | bfloat16 | [2048 1 1] | [0] | contiguous | bwd | 37502 | 10186 | 3.68 |
| Var | bfloat16 | [2048 1 1] | [0] | noncontiguous | bwd | 34718 | 9778 | 3.55 |
| Var | bfloat16 | [8192 1 1] | [0] | contiguous | bwd | 34334 | 9138 | 3.76 |
| Var | bfloat16 | [8192 1 1] | [0] | noncontiguous | bwd | 37086 | 8746 | 4.24 |
| Var | bfloat16 | [65536 1 1] | [0] | contiguous | bwd | 40846 | 8906 | 4.59 |
| Var | bfloat16 | [65536 1 1] | [0] | noncontiguous | bwd | 44606 | 8693 | 5.13 |
| Var | bfloat16 | [80 40 1] | [0] | contiguous | bwd | 50205 | 9333 | 5.38 |
| Var | bfloat16 | [80 40 1] | [0] | noncontiguous | bwd | 52909 | 12729 | 4.16 |
| Var | bfloat16 | [80 250 1] | [0] | contiguous | bwd | 42862 | 9618 | 4.46 |
| Var | bfloat16 | [80 250 1] | [0] | noncontiguous | bwd | 46174 | 12906 | 3.58 |
| Var | bfloat16 | [40 50 30] | [0] | contiguous | bwd | 45166 | 9439 | 4.79 |
| Var | bfloat16 | [40 50 30] | [0] | noncontiguous | bwd | 49197 | 16035 | 3.07 |
| Var | bfloat16 | [30 50 10] | [0] | contiguous | bwd | 46702 | 8960 | 5.21 |
| Var | bfloat16 | [30 50 10] | [0] | noncontiguous | bwd | 51310 | 12871 | 3.99 |
| Var | bfloat16 | [50 40 50] | [0] | contiguous | bwd | 46942 | 8995 | 5.22 |
| Var | bfloat16 | [50 40 50] | [0] | noncontiguous | bwd | 53149 | 17760 | 2.99 |
| Var | bfloat16 | [40 60 10] | [0] | contiguous | bwd | 43854 | 8622 | 5.09 |
| Var | bfloat16 | [40 60 10] | [0] | noncontiguous | bwd | 50142 | 14400 | 3.48 |
| Var | bfloat16 | [30 50 20] | [1] | contiguous | bwd | 49310 | 8711 | 5.66 |
| Var | bfloat16 | [30 50 20] | [1] | noncontiguous | bwd | 50670 | 14009 | 3.62 |