MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Trace

Open cognaiger9 opened this issue 10 months ago • 0 comments

  • Add Trace operation with forward and backward kernels.
  • Add driver and gtest for kernels.

Average improvement over ROCm

type fwd bwd
float16 2.35 3.07
float 3.26 3.3
bfloat16 2.44 3.84

Detail Benchmark

float16 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float16 [34 4] contiguous fwd 15808 6080 2.60
Trace float16 [98 4] contiguous fwd 15824 6009 2.63
Trace float16 [190 4] contiguous fwd 13008 6080 2.14
Trace float16 [249 128] contiguous fwd 12464 7733 1.61
Trace float16 [349 222] contiguous fwd 15168 7840 1.93
Trace float16 [451 128] contiguous fwd 11888 7662 1.55
Trace float16 [2048 20480] contiguous fwd 31967 22952 1.39
Trace float16 [4096 45960] contiguous fwd 65006 25352 2.56
Trace float16 [8192 8192] contiguous fwd 47343 23130 2.05
Trace float16 [16384 16384] contiguous fwd 118285 23521 5.03
float16 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float16 [34 4] contiguous bwd 92958 24766 3.75
Trace float16 [98 4] contiguous bwd 99566 29033 3.43
Trace float16 [190 4] contiguous bwd 107310 26544 4.04
Trace float16 [249 128] contiguous bwd 107614 24979 4.31
Trace float16 [349 222] contiguous bwd 67825 23966 2.83
Trace float16 [451 128] contiguous bwd 65054 27468 2.37
Trace float16 [603 546] contiguous bwd 74990 30757 2.44
Trace float16 [1024 10240] contiguous bwd 105630 67133 1.57
Trace float16 [1024 1024] contiguous bwd 90894 25708 3.54
Trace float16 [2048 2048] contiguous bwd 91998 37726 2.44
float32 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float32 [34 4] contiguous fwd 15936 5653 2.82
Trace float32 [98 4] contiguous fwd 14784 5796 2.55
Trace float32 [190 4] contiguous fwd 13808 5458 2.53
Trace float32 [249 128] contiguous fwd 12576 7556 1.66
Trace float32 [349 222] contiguous fwd 14944 7520 1.99
Trace float32 [451 128] contiguous fwd 10736 7840 1.37
Trace float32 [2048 20480] contiguous fwd 46959 25761 1.82
Trace float32 [4096 45960] contiguous fwd 79550 21192 3.75
Trace float32 [4096 4096] contiguous fwd 28544 22721 1.26
Trace float32 [8192 8192] contiguous fwd 68670 22792 3.01
Trace float32 [16384 16384] contiguous fwd 208124 21067 9.88
float32 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace float32 [34 4] contiguous bwd 98350 24090 4.08
Trace float32 [98 4] contiguous bwd 97083 22277 4.36
Trace float32 [190 4] contiguous bwd 97902 23574 4.15
Trace float32 [249 128] contiguous bwd 98942 20908 4.73
Trace float32 [349 222] contiguous bwd 72878 24890 2.93
Trace float32 [451 128] contiguous bwd 63966 28748 2.23
Trace float32 [603 546] contiguous bwd 73470 27468 2.67
Trace float32 [1024 10240] contiguous bwd 157788 67115 2.35
Trace float32 [1024 1024] contiguous bwd 106190 30864 3.44
Trace float32 [2048 2048] contiguous bwd 148285 37140 3.99
Trace float32 [4096 4096] contiguous bwd 144429 99188 1.46
bfloat16 (forward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace bfloat16 [34 4] contiguous fwd 16368 5902 2.77
Trace bfloat16 [98 4] contiguous fwd 16304 6382 2.55
Trace bfloat16 [190 4] contiguous fwd 13152 5867 2.24
Trace bfloat16 [249 128] contiguous fwd 13760 7662 1.80
Trace bfloat16 [349 222] contiguous fwd 13536 7698 1.76
Trace bfloat16 [451 128] contiguous fwd 12512 8071 1.55
Trace bfloat16 [2048 20480] contiguous fwd 32127 21281 1.51
Trace bfloat16 [4096 45960] contiguous fwd 65182 22970 2.84
Trace bfloat16 [8192 8192] contiguous fwd 47471 23539 2.02
Trace bfloat16 [16384 16384] contiguous fwd 119613 22490 5.32
bfloat16 (backward)
op_name dtype input size contiguous direction ROCm MIOpen Improvement
Trace bfloat16 [34 4] contiguous bwd 131661 25566 5.15
Trace bfloat16 [98 4] contiguous bwd 123069 23930 5.14
Trace bfloat16 [190 4] contiguous bwd 130157 22704 5.73
Trace bfloat16 [249 128] contiguous bwd 119470 24766 4.82
Trace bfloat16 [349 222] contiguous bwd 85134 24641 3.45
Trace bfloat16 [451 128] contiguous bwd 82382 23610 3.49
Trace bfloat16 [603 546] contiguous bwd 84830 33762 2.51
Trace bfloat16 [1024 10240] contiguous bwd 103645 68466 1.51
Trace bfloat16 [1024 1024] contiguous bwd 97166 25548 3.80
Trace bfloat16 [2048 2048] contiguous bwd 107102 38651 2.77

cognaiger9 avatar Feb 20 '25 03:02 cognaiger9