Implement Trace
Open
cognaiger9
opened this issue 10 months ago
•
0 comments
- Add Trace operation with forward and backward kernels.
- Add driver and gtest for kernels.
Average improvement over ROCm
| type |
fwd |
bwd |
| float16 |
2.35 |
3.07 |
| float |
3.26 |
3.3 |
| bfloat16 |
2.44 |
3.84 |
Detail Benchmark
float16 (forward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
float16 |
[34 4] |
contiguous |
fwd |
15808 |
6080 |
2.60 |
| Trace |
float16 |
[98 4] |
contiguous |
fwd |
15824 |
6009 |
2.63 |
| Trace |
float16 |
[190 4] |
contiguous |
fwd |
13008 |
6080 |
2.14 |
| Trace |
float16 |
[249 128] |
contiguous |
fwd |
12464 |
7733 |
1.61 |
| Trace |
float16 |
[349 222] |
contiguous |
fwd |
15168 |
7840 |
1.93 |
| Trace |
float16 |
[451 128] |
contiguous |
fwd |
11888 |
7662 |
1.55 |
| Trace |
float16 |
[2048 20480] |
contiguous |
fwd |
31967 |
22952 |
1.39 |
| Trace |
float16 |
[4096 45960] |
contiguous |
fwd |
65006 |
25352 |
2.56 |
| Trace |
float16 |
[8192 8192] |
contiguous |
fwd |
47343 |
23130 |
2.05 |
| Trace |
float16 |
[16384 16384] |
contiguous |
fwd |
118285 |
23521 |
5.03 |
float16 (backward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
float16 |
[34 4] |
contiguous |
bwd |
92958 |
24766 |
3.75 |
| Trace |
float16 |
[98 4] |
contiguous |
bwd |
99566 |
29033 |
3.43 |
| Trace |
float16 |
[190 4] |
contiguous |
bwd |
107310 |
26544 |
4.04 |
| Trace |
float16 |
[249 128] |
contiguous |
bwd |
107614 |
24979 |
4.31 |
| Trace |
float16 |
[349 222] |
contiguous |
bwd |
67825 |
23966 |
2.83 |
| Trace |
float16 |
[451 128] |
contiguous |
bwd |
65054 |
27468 |
2.37 |
| Trace |
float16 |
[603 546] |
contiguous |
bwd |
74990 |
30757 |
2.44 |
| Trace |
float16 |
[1024 10240] |
contiguous |
bwd |
105630 |
67133 |
1.57 |
| Trace |
float16 |
[1024 1024] |
contiguous |
bwd |
90894 |
25708 |
3.54 |
| Trace |
float16 |
[2048 2048] |
contiguous |
bwd |
91998 |
37726 |
2.44 |
float32 (forward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
float32 |
[34 4] |
contiguous |
fwd |
15936 |
5653 |
2.82 |
| Trace |
float32 |
[98 4] |
contiguous |
fwd |
14784 |
5796 |
2.55 |
| Trace |
float32 |
[190 4] |
contiguous |
fwd |
13808 |
5458 |
2.53 |
| Trace |
float32 |
[249 128] |
contiguous |
fwd |
12576 |
7556 |
1.66 |
| Trace |
float32 |
[349 222] |
contiguous |
fwd |
14944 |
7520 |
1.99 |
| Trace |
float32 |
[451 128] |
contiguous |
fwd |
10736 |
7840 |
1.37 |
| Trace |
float32 |
[2048 20480] |
contiguous |
fwd |
46959 |
25761 |
1.82 |
| Trace |
float32 |
[4096 45960] |
contiguous |
fwd |
79550 |
21192 |
3.75 |
| Trace |
float32 |
[4096 4096] |
contiguous |
fwd |
28544 |
22721 |
1.26 |
| Trace |
float32 |
[8192 8192] |
contiguous |
fwd |
68670 |
22792 |
3.01 |
| Trace |
float32 |
[16384 16384] |
contiguous |
fwd |
208124 |
21067 |
9.88 |
float32 (backward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
float32 |
[34 4] |
contiguous |
bwd |
98350 |
24090 |
4.08 |
| Trace |
float32 |
[98 4] |
contiguous |
bwd |
97083 |
22277 |
4.36 |
| Trace |
float32 |
[190 4] |
contiguous |
bwd |
97902 |
23574 |
4.15 |
| Trace |
float32 |
[249 128] |
contiguous |
bwd |
98942 |
20908 |
4.73 |
| Trace |
float32 |
[349 222] |
contiguous |
bwd |
72878 |
24890 |
2.93 |
| Trace |
float32 |
[451 128] |
contiguous |
bwd |
63966 |
28748 |
2.23 |
| Trace |
float32 |
[603 546] |
contiguous |
bwd |
73470 |
27468 |
2.67 |
| Trace |
float32 |
[1024 10240] |
contiguous |
bwd |
157788 |
67115 |
2.35 |
| Trace |
float32 |
[1024 1024] |
contiguous |
bwd |
106190 |
30864 |
3.44 |
| Trace |
float32 |
[2048 2048] |
contiguous |
bwd |
148285 |
37140 |
3.99 |
| Trace |
float32 |
[4096 4096] |
contiguous |
bwd |
144429 |
99188 |
1.46 |
bfloat16 (forward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
bfloat16 |
[34 4] |
contiguous |
fwd |
16368 |
5902 |
2.77 |
| Trace |
bfloat16 |
[98 4] |
contiguous |
fwd |
16304 |
6382 |
2.55 |
| Trace |
bfloat16 |
[190 4] |
contiguous |
fwd |
13152 |
5867 |
2.24 |
| Trace |
bfloat16 |
[249 128] |
contiguous |
fwd |
13760 |
7662 |
1.80 |
| Trace |
bfloat16 |
[349 222] |
contiguous |
fwd |
13536 |
7698 |
1.76 |
| Trace |
bfloat16 |
[451 128] |
contiguous |
fwd |
12512 |
8071 |
1.55 |
| Trace |
bfloat16 |
[2048 20480] |
contiguous |
fwd |
32127 |
21281 |
1.51 |
| Trace |
bfloat16 |
[4096 45960] |
contiguous |
fwd |
65182 |
22970 |
2.84 |
| Trace |
bfloat16 |
[8192 8192] |
contiguous |
fwd |
47471 |
23539 |
2.02 |
| Trace |
bfloat16 |
[16384 16384] |
contiguous |
fwd |
119613 |
22490 |
5.32 |
bfloat16 (backward)
| op_name |
dtype |
input size |
contiguous |
direction |
ROCm |
MIOpen |
Improvement |
| Trace |
bfloat16 |
[34 4] |
contiguous |
bwd |
131661 |
25566 |
5.15 |
| Trace |
bfloat16 |
[98 4] |
contiguous |
bwd |
123069 |
23930 |
5.14 |
| Trace |
bfloat16 |
[190 4] |
contiguous |
bwd |
130157 |
22704 |
5.73 |
| Trace |
bfloat16 |
[249 128] |
contiguous |
bwd |
119470 |
24766 |
4.82 |
| Trace |
bfloat16 |
[349 222] |
contiguous |
bwd |
85134 |
24641 |
3.45 |
| Trace |
bfloat16 |
[451 128] |
contiguous |
bwd |
82382 |
23610 |
3.49 |
| Trace |
bfloat16 |
[603 546] |
contiguous |
bwd |
84830 |
33762 |
2.51 |
| Trace |
bfloat16 |
[1024 10240] |
contiguous |
bwd |
103645 |
68466 |
1.51 |
| Trace |
bfloat16 |
[1024 1024] |
contiguous |
bwd |
97166 |
25548 |
3.80 |
| Trace |
bfloat16 |
[2048 2048] |
contiguous |
bwd |
107102 |
38651 |
2.77 |