MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Var

Open cognaiger9 opened this issue 10 months ago • 0 comments

  • Added backward Var operation and kernel.
  • Added driver test and gtest for Var.

When comparing the newly developed miopen var kernel with ROCm, there's performance improvement for a specific range of input sizes.(1024 ~ 1024 * 1024 * 2)

Type Direction geomean
fp32 bwd 3.61
fp16 bwd 3.9
bfp16 bwd 4.28

float32
op_name dtype size dim contiguous direction ROCm MIOpen Improvement
Var float32 [2048 1 1] [0] contiguous bwd 29150 9689 3.01
Var float32 [2048 1 1] [0] noncontiguous bwd 29150 9227 3.16
Var float32 [8192 1 1] [0] contiguous bwd 27918 9227 3.03
Var float32 [8192 1 1] [0] noncontiguous bwd 29790 8391 3.55
Var float32 [65536 1 1] [0] contiguous bwd 38894 9476 4.10
Var float32 [65536 1 1] [0] noncontiguous bwd 41230 9102 4.53
Var float32 [80 40 1] [0] contiguous bwd 42462 9209 4.61
Var float32 [80 40 1] [0] noncontiguous bwd 41486 12604 3.29
Var float32 [80 250 1] [0] contiguous bwd 35838 8675 4.13
Var float32 [80 250 1] [0] noncontiguous bwd 35998 12427 2.90
Var float32 [40 50 30] [0] contiguous bwd 35406 9173 3.86
Var float32 [40 50 30] [0] noncontiguous bwd 39469 15147 2.61
Var float32 [30 50 10] [0] contiguous bwd 42814 8373 5.11
Var float32 [30 50 10] [0] noncontiguous bwd 37678 13173 2.86
Var float32 [50 40 50] [0] contiguous bwd 36910 9422 3.92
Var float32 [50 40 50] [0] noncontiguous bwd 43630 16658 2.62
Var float32 [40 60 10] [0] contiguous bwd 37374 8018 4.66
Var float32 [40 60 10] [0] noncontiguous bwd 37134 12604 2.95
Var float32 [30 50 20] [1] contiguous bwd 38462 8480 4.54
Var float32 [30 50 20] [1] noncontiguous bwd 39870 14169 2.81

float16
op_name dtype size dim contiguous direction ROCm MIOpen Improvement
Var float16 [2048 1 1] [0] contiguous bwd 36558 9689 3.77
Var float16 [2048 1 1] [0] noncontiguous bwd 33518 9440 3.55
Var float16 [8192 1 1] [0] contiguous bwd 31230 9653 3.24
Var float16 [8192 1 1] [0] noncontiguous bwd 33070 9760 3.39
Var float16 [65536 1 1] [0] contiguous bwd 37726 9671 3.90
Var float16 [65536 1 1] [0] noncontiguous bwd 42126 9475 4.45
Var float16 [80 40 1] [0] contiguous bwd 46318 9938 4.66
Var float16 [80 40 1] [0] noncontiguous bwd 52381 12533 4.18
Var float16 [80 250 1] [0] contiguous bwd 41342 8853 4.67
Var float16 [80 250 1] [0] noncontiguous bwd 43662 13013 3.36
Var float16 [40 50 30] [0] contiguous bwd 40590 8409 4.83
Var float16 [40 50 30] [0] noncontiguous bwd 45086 15982 2.82
Var float16 [30 50 10] [0] contiguous bwd 40398 9137 4.42
Var float16 [30 50 10] [0] noncontiguous bwd 45101 14187 3.18
Var float16 [50 40 50] [0] contiguous bwd 43678 8960 4.87
Var float16 [50 40 50] [0] noncontiguous bwd 49054 17778 2.76
Var float16 [40 60 10] [0] contiguous bwd 40318 9333 4.32
Var float16 [40 60 10] [0] noncontiguous bwd 45102 13511 3.34
Var float16 [30 50 20] [1] contiguous bwd 43662 8551 5.11
Var float16 [30 50 20] [1] noncontiguous bwd 45870 13795 3.33

bfloat16
op_name dtype size dim contiguous direction ROCm MIOpen Improvement
Var bfloat16 [2048 1 1] [0] contiguous bwd 37502 10186 3.68
Var bfloat16 [2048 1 1] [0] noncontiguous bwd 34718 9778 3.55
Var bfloat16 [8192 1 1] [0] contiguous bwd 34334 9138 3.76
Var bfloat16 [8192 1 1] [0] noncontiguous bwd 37086 8746 4.24
Var bfloat16 [65536 1 1] [0] contiguous bwd 40846 8906 4.59
Var bfloat16 [65536 1 1] [0] noncontiguous bwd 44606 8693 5.13
Var bfloat16 [80 40 1] [0] contiguous bwd 50205 9333 5.38
Var bfloat16 [80 40 1] [0] noncontiguous bwd 52909 12729 4.16
Var bfloat16 [80 250 1] [0] contiguous bwd 42862 9618 4.46
Var bfloat16 [80 250 1] [0] noncontiguous bwd 46174 12906 3.58
Var bfloat16 [40 50 30] [0] contiguous bwd 45166 9439 4.79
Var bfloat16 [40 50 30] [0] noncontiguous bwd 49197 16035 3.07
Var bfloat16 [30 50 10] [0] contiguous bwd 46702 8960 5.21
Var bfloat16 [30 50 10] [0] noncontiguous bwd 51310 12871 3.99
Var bfloat16 [50 40 50] [0] contiguous bwd 46942 8995 5.22
Var bfloat16 [50 40 50] [0] noncontiguous bwd 53149 17760 2.99
Var bfloat16 [40 60 10] [0] contiguous bwd 43854 8622 5.09
Var bfloat16 [40 60 10] [0] noncontiguous bwd 50142 14400 3.48
Var bfloat16 [30 50 20] [1] contiguous bwd 49310 8711 5.66
Var bfloat16 [30 50 20] [1] noncontiguous bwd 50670 14009 3.62

cognaiger9 avatar Feb 26 '25 04:02 cognaiger9