Implement GatherV2
Open
cognaiger9
opened this issue 11 months ago
•
0 comments
- Detail of operation (tensorflow)
- Add GatherV2 operation with non-batched and batched backward kernels.
- Add driver and gtest for kernels.
Average improvement over ROCm
| type |
bwd |
| float16 |
4.65 |
| float |
5.59 |
| bfloat16 |
5.16 |
Detail Benchmark
float16
| op_name |
dim |
batch dim |
dtype |
param size |
indices size |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| GatherV2 |
0 |
0 |
float16 |
[2 2 4 6 8] |
[16 16] |
bwd |
570977 |
244854 |
2.33 |
| GatherV2 |
1 |
0 |
float16 |
[2 2 4 6 8] |
[16 16] |
bwd |
452338 |
252338 |
1.79 |
| GatherV2 |
2 |
0 |
float16 |
[2 2 4 6 8] |
[16 16] |
bwd |
341809 |
105227 |
3.25 |
| GatherV2 |
3 |
0 |
float16 |
[2 2 4 6 8] |
[16 16] |
bwd |
387715 |
81778 |
4.74 |
| GatherV2 |
4 |
0 |
float16 |
[2 2 4 6 8] |
[16 16] |
bwd |
417489 |
84355 |
4.95 |
| GatherV2 |
0 |
0 |
float16 |
[2 4 8 32 64] |
[8 8] |
bwd |
411457 |
146649 |
2.81 |
| GatherV2 |
1 |
0 |
float16 |
[2 4 8 32 64] |
[8 8] |
bwd |
346176 |
112445 |
3.08 |
| GatherV2 |
2 |
0 |
float16 |
[2 4 8 32 64] |
[8 8] |
bwd |
499682 |
114063 |
4.38 |
| GatherV2 |
0 |
0 |
float16 |
[2 4 8 32 64] |
[16 64] |
bwd |
3129323 |
1710280 |
1.83 |
| GatherV2 |
1 |
0 |
float16 |
[2 4 8 32 64] |
[16 64] |
bwd |
1854078 |
1284180 |
1.44 |
| GatherV2 |
4 |
0 |
float16 |
[2 4 8 32 64] |
[16 64] |
bwd |
2486584 |
382152 |
6.51 |
| GatherV2 |
0 |
0 |
float16 |
[4 16 32 64 64] |
[8 16] |
bwd |
9583599 |
5384230 |
1.78 |
| GatherV2 |
1 |
0 |
float16 |
[4 16 32 64 64] |
[8 16] |
bwd |
2583113 |
1443680 |
1.79 |
| GatherV2 |
2 |
0 |
float16 |
[4 16 32 64 64] |
[8 16] |
bwd |
1550325 |
1038899 |
1.49 |
| GatherV2 |
3 |
0 |
float16 |
[4 16 32 64 64] |
[8 16] |
bwd |
1307337 |
633816 |
2.06 |
| GatherV2 |
0 |
0 |
float16 |
[16 16 32 64 128] |
[16 32] |
bwd |
51681692 |
43223100 |
1.20 |
| GatherV2 |
1 |
0 |
float16 |
[16 16 32 64 128] |
[16 32] |
bwd |
50925790 |
45607700 |
1.12 |
| GatherV2 |
1 |
1 |
float16 |
[2 4 8 32 64] |
[2 8] |
bwd |
287308 |
45692 |
6.29 |
| GatherV2 |
2 |
2 |
float16 |
[2 4 8 32 64] |
[2 4] |
bwd |
261031 |
32002 |
8.16 |
| GatherV2 |
4 |
2 |
float16 |
[2 4 8 32 64] |
[2 4] |
bwd |
321297 |
28944 |
11.10 |
| GatherV2 |
1 |
1 |
float16 |
[4 16 32 64 64] |
[4 16] |
bwd |
2953109 |
247983 |
11.91 |
| GatherV2 |
2 |
2 |
float16 |
[4 16 32 64 64] |
[4 16] |
bwd |
488315 |
66778 |
7.31 |
| GatherV2 |
4 |
2 |
float16 |
[4 16 32 64 64] |
[4 16] |
bwd |
645411 |
101625 |
6.35 |
float32
| op_name |
dim |
batch dim |
dtype |
param size |
indices size |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| GatherV2 |
0 |
0 |
float32 |
[2 2 4 6 8] |
[16 16] |
bwd |
412338 |
74560 |
5.53 |
| GatherV2 |
1 |
0 |
float32 |
[2 2 4 6 8] |
[16 16] |
bwd |
362641 |
81280 |
4.46 |
| GatherV2 |
2 |
0 |
float32 |
[2 2 4 6 8] |
[16 16] |
bwd |
355501 |
43715 |
8.13 |
| GatherV2 |
3 |
0 |
float32 |
[2 2 4 6 8] |
[16 16] |
bwd |
352977 |
43217 |
8.17 |
| GatherV2 |
4 |
0 |
float32 |
[2 2 4 6 8] |
[16 16] |
bwd |
441666 |
49848 |
8.86 |
| GatherV2 |
0 |
0 |
float32 |
[2 4 8 32 64] |
[8 8] |
bwd |
336881 |
79929 |
4.21 |
| GatherV2 |
1 |
0 |
float32 |
[2 4 8 32 64] |
[8 8] |
bwd |
388658 |
58702 |
6.62 |
| GatherV2 |
2 |
0 |
float32 |
[2 4 8 32 64] |
[8 8] |
bwd |
361425 |
56569 |
6.39 |
| GatherV2 |
0 |
0 |
float32 |
[2 4 8 32 64] |
[16 64] |
bwd |
2090839 |
905531 |
2.31 |
| GatherV2 |
1 |
0 |
float32 |
[2 4 8 32 64] |
[16 64] |
bwd |
1279412 |
551184 |
2.32 |
| GatherV2 |
2 |
0 |
float32 |
[2 4 8 32 64] |
[16 64] |
bwd |
1098788 |
657868 |
1.67 |
| GatherV2 |
3 |
0 |
float32 |
[2 4 8 32 64] |
[16 64] |
bwd |
696770 |
275432 |
2.53 |
| GatherV2 |
0 |
0 |
float32 |
[4 16 32 64 64] |
[8 16] |
bwd |
7004803 |
4010500 |
1.75 |
| GatherV2 |
1 |
0 |
float32 |
[4 16 32 64 64] |
[8 16] |
bwd |
2017621 |
1158890 |
1.74 |
| GatherV2 |
2 |
0 |
float32 |
[4 16 32 64 64] |
[8 16] |
bwd |
1344719 |
624252 |
2.15 |
| GatherV2 |
3 |
0 |
float32 |
[4 16 32 64 64] |
[8 16] |
bwd |
1471460 |
357056 |
4.12 |
| GatherV2 |
0 |
0 |
float32 |
[16 16 32 64 128] |
[16 32] |
bwd |
37657097 |
32223999 |
1.17 |
| GatherV2 |
3 |
0 |
float32 |
[16 16 32 64 128] |
[16 32] |
bwd |
15822371 |
11934400 |
1.33 |
| GatherV2 |
1 |
1 |
float32 |
[2 2 4 6 8] |
[2 4] |
bwd |
302031 |
30544 |
9.89 |
| GatherV2 |
2 |
2 |
float32 |
[2 2 4 6 8] |
[2 2] |
bwd |
266457 |
34598 |
7.70 |
| GatherV2 |
4 |
2 |
float32 |
[2 2 4 6 8] |
[2 2] |
bwd |
278411 |
32678 |
8.52 |
| GatherV2 |
1 |
1 |
float32 |
[2 4 8 32 64] |
[2 8] |
bwd |
269914 |
28997 |
9.31 |
| GatherV2 |
2 |
2 |
float32 |
[2 4 8 32 64] |
[2 4] |
bwd |
301791 |
40998 |
7.36 |
| GatherV2 |
4 |
2 |
float32 |
[2 4 8 32 64] |
[2 4] |
bwd |
245494 |
30153 |
8.14 |
| GatherV2 |
1 |
1 |
float32 |
[4 16 32 64 64] |
[4 16] |
bwd |
2256154 |
222577 |
10.14 |
| GatherV2 |
2 |
2 |
float32 |
[4 16 32 64 64] |
[4 16] |
bwd |
562390 |
66049 |
8.51 |
| GatherV2 |
4 |
2 |
float32 |
[4 16 32 64 64] |
[4 16] |
bwd |
912272 |
114960 |
7.94 |
bfloat16
| op_name |
dim |
batch dim |
dtype |
param size |
indices size |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| GatherV2 |
0 |
0 |
bfloat16 |
[2 2 4 6 8] |
[16 16] |
bwd |
594514 |
236658 |
2.51 |
| GatherV2 |
1 |
0 |
bfloat16 |
[2 2 4 6 8] |
[16 16] |
bwd |
472834 |
248161 |
1.91 |
| GatherV2 |
2 |
0 |
bfloat16 |
[2 2 4 6 8] |
[16 16] |
bwd |
366961 |
111147 |
3.30 |
| GatherV2 |
3 |
0 |
bfloat16 |
[2 2 4 6 8] |
[16 16] |
bwd |
420721 |
87947 |
4.78 |
| GatherV2 |
4 |
0 |
bfloat16 |
[2 2 4 6 8] |
[16 16] |
bwd |
493618 |
82115 |
6.01 |
| GatherV2 |
0 |
0 |
bfloat16 |
[2 4 8 32 64] |
[8 8] |
bwd |
416145 |
130294 |
3.19 |
| GatherV2 |
1 |
0 |
bfloat16 |
[2 4 8 32 64] |
[8 8] |
bwd |
445201 |
117511 |
3.79 |
| GatherV2 |
2 |
0 |
bfloat16 |
[2 4 8 32 64] |
[8 8] |
bwd |
430690 |
107502 |
4.01 |
| GatherV2 |
3 |
0 |
bfloat16 |
[2 4 8 32 64] |
[8 8] |
bwd |
358578 |
38897 |
9.22 |
| GatherV2 |
0 |
0 |
bfloat16 |
[2 4 8 32 64] |
[16 64] |
bwd |
3190058 |
1470390 |
2.17 |
| GatherV2 |
1 |
0 |
bfloat16 |
[2 4 8 32 64] |
[16 64] |
bwd |
1907398 |
1150150 |
1.66 |
| GatherV2 |
4 |
0 |
bfloat16 |
[2 4 8 32 64] |
[16 64] |
bwd |
2475236 |
365103 |
6.78 |
| GatherV2 |
0 |
0 |
bfloat16 |
[4 16 32 64 64] |
[8 16] |
bwd |
9823571 |
5387530 |
1.82 |
| GatherV2 |
1 |
0 |
bfloat16 |
[4 16 32 64 64] |
[8 16] |
bwd |
2671833 |
1452310 |
1.84 |
| GatherV2 |
2 |
0 |
bfloat16 |
[4 16 32 64 64] |
[8 16] |
bwd |
1596746 |
1028559 |
1.55 |
| GatherV2 |
3 |
0 |
bfloat16 |
[4 16 32 64 64] |
[8 16] |
bwd |
1274308 |
631826 |
2.02 |
| GatherV2 |
0 |
0 |
bfloat16 |
[16 16 32 64 128] |
[16 32] |
bwd |
52973827 |
43275600 |
1.22 |
| GatherV2 |
1 |
0 |
bfloat16 |
[16 16 32 64 128] |
[16 32] |
bwd |
52040938 |
45934000 |
1.13 |
| GatherV2 |
1 |
1 |
bfloat16 |
[2 2 4 6 8] |
[2 4] |
bwd |
305423 |
30651 |
9.96 |
| GatherV2 |
2 |
2 |
bfloat16 |
[2 2 4 6 8] |
[2 2] |
bwd |
264632 |
27735 |
9.54 |
| GatherV2 |
4 |
2 |
bfloat16 |
[2 2 4 6 8] |
[2 2] |
bwd |
265897 |
27095 |
9.81 |
| GatherV2 |
1 |
1 |
bfloat16 |
[2 4 8 32 64] |
[2 8] |
bwd |
298350 |
30971 |
9.63 |
| GatherV2 |
4 |
2 |
bfloat16 |
[2 4 8 32 64] |
[2 4] |
bwd |
263545 |
28339 |
9.30 |
| GatherV2 |
1 |
1 |
bfloat16 |
[4 16 32 64 64] |
[4 16] |
bwd |
3029180 |
251201 |
12.06 |
| GatherV2 |
2 |
2 |
bfloat16 |
[4 16 32 64 64] |
[4 16] |
bwd |
510302 |
66333 |
7.69 |
| GatherV2 |
4 |
2 |
bfloat16 |
[4 16 32 64 64] |
[4 16] |
bwd |
739456 |
101572 |
7.28 |