Implement IndexSelect
Open
cognaiger9
opened this issue 10 months ago
•
0 comments
- Add IndexSelect operation with forward and backward kernels.
- Add driver and gtest for kernels.
- MIOpen performs better if:
- Number of output elements is less than 100000
Average improvement over ROCm
| type |
fwd |
bwd |
| float16 |
1.6 |
1.94 |
| float |
1.67 |
1.55 |
| bfloat16 |
1.63 |
1.76 |
Detail Benchmark
float16 (forward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
float16 |
[16 16 16] |
[5] |
noncont |
0 |
fwd |
12896 |
10737 |
1.20 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
cont |
0 |
fwd |
13168 |
8124 |
1.62 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
noncont |
0 |
fwd |
19728 |
9066 |
2.18 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
cont |
1 |
fwd |
16320 |
8675 |
1.88 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
noncont |
1 |
fwd |
16832 |
9191 |
1.83 |
| IndexSelect |
float16 |
[32 32 32] |
[5] |
cont |
1 |
fwd |
10576 |
8871 |
1.19 |
| IndexSelect |
float16 |
[32 32 32] |
[5] |
noncont |
1 |
fwd |
11504 |
9048 |
1.27 |
| IndexSelect |
float16 |
[32 32 32] |
[5] |
noncont |
2 |
fwd |
13056 |
9368 |
1.39 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
cont |
0 |
fwd |
12288 |
8337 |
1.47 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
0 |
fwd |
18160 |
9990 |
1.82 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
cont |
1 |
fwd |
15456 |
8213 |
1.88 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
1 |
fwd |
15824 |
9031 |
1.75 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
cont |
2 |
fwd |
11056 |
8746 |
1.26 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
2 |
fwd |
18096 |
10364 |
1.75 |
float32 (forward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
float32 |
[16 16 16] |
[5] |
noncont |
0 |
fwd |
12720 |
9439 |
1.35 |
| IndexSelect |
float32 |
[16 16 16] |
[5] |
noncont |
2 |
fwd |
12176 |
9457 |
1.29 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
cont |
0 |
fwd |
13152 |
8159 |
1.61 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
noncont |
0 |
fwd |
19408 |
9155 |
2.12 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
cont |
1 |
fwd |
17120 |
8408 |
2.04 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
noncont |
1 |
fwd |
17728 |
8746 |
2.03 |
| IndexSelect |
float32 |
[32 32 32] |
[5] |
cont |
2 |
fwd |
10064 |
8871 |
1.13 |
| IndexSelect |
float32 |
[32 32 32] |
[5] |
noncont |
2 |
fwd |
13216 |
9262 |
1.43 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
cont |
0 |
fwd |
12416 |
8462 |
1.47 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
0 |
fwd |
19136 |
9155 |
2.09 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
cont |
1 |
fwd |
15887 |
8284 |
1.92 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
1 |
fwd |
16016 |
9546 |
1.68 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
cont |
2 |
fwd |
12224 |
8835 |
1.38 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
2 |
fwd |
18704 |
9884 |
1.89 |
bfloat16 (forward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
bfloat16 |
[16 16 16] |
[5] |
noncont |
0 |
fwd |
12864 |
10364 |
1.24 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
cont |
0 |
fwd |
12976 |
8266 |
1.57 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
noncont |
0 |
fwd |
19232 |
9031 |
2.13 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
cont |
1 |
fwd |
16576 |
7768 |
2.13 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
noncont |
1 |
fwd |
16608 |
8942 |
1.86 |
| IndexSelect |
bfloat16 |
[32 32 32] |
[5] |
cont |
0 |
fwd |
9040 |
9315 |
0.97 |
| IndexSelect |
bfloat16 |
[32 32 32] |
[5] |
noncont |
0 |
fwd |
12208 |
9244 |
1.32 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
cont |
0 |
fwd |
12048 |
8764 |
1.37 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
0 |
fwd |
17856 |
9386 |
1.90 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
cont |
1 |
fwd |
15536 |
8231 |
1.89 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
1 |
fwd |
15696 |
8906 |
1.76 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
cont |
2 |
fwd |
11056 |
8728 |
1.27 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
2 |
fwd |
18208 |
10239 |
1.78 |
float16 (backward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
float16 |
[16 16 16] |
[5] |
cont |
0 |
bwd |
66127 |
23928 |
2.76 |
| IndexSelect |
float16 |
[16 16 16] |
[5] |
noncont |
0 |
bwd |
73551 |
26595 |
2.77 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
cont |
0 |
bwd |
42000 |
25653 |
1.64 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
noncont |
0 |
bwd |
57120 |
26221 |
2.18 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
cont |
2 |
bwd |
39765 |
27057 |
1.47 |
| IndexSelect |
float16 |
[16 32 32] |
[16] |
noncont |
2 |
bwd |
56479 |
26151 |
2.16 |
| IndexSelect |
float16 |
[32 32 32] |
[5] |
cont |
0 |
bwd |
30656 |
23644 |
1.30 |
| IndexSelect |
float16 |
[32 32 32] |
[5] |
noncont |
0 |
bwd |
40944 |
26186 |
1.56 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
cont |
0 |
bwd |
37056 |
25475 |
1.45 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
0 |
bwd |
62303 |
27324 |
2.28 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
cont |
1 |
bwd |
38160 |
23110 |
1.65 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
1 |
bwd |
56863 |
23502 |
2.42 |
| IndexSelect |
float16 |
[100 50 20] |
[10] |
noncont |
2 |
bwd |
50831 |
32497 |
1.56 |
float32 (backward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
float32 |
[16 16 16] |
[5] |
cont |
0 |
bwd |
33808 |
25333 |
1.33 |
| IndexSelect |
float32 |
[16 16 16] |
[5] |
noncont |
0 |
bwd |
39168 |
24213 |
1.62 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
cont |
0 |
bwd |
28800 |
23555 |
1.22 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
noncont |
0 |
bwd |
48048 |
25670 |
1.87 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
cont |
1 |
bwd |
34592 |
26275 |
1.32 |
| IndexSelect |
float32 |
[16 32 32] |
[16] |
noncont |
1 |
bwd |
44592 |
27377 |
1.63 |
| IndexSelect |
float32 |
[32 32 32] |
[5] |
noncont |
0 |
bwd |
55616 |
24213 |
2.30 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
0 |
bwd |
36624 |
25795 |
1.42 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
1 |
bwd |
43904 |
27377 |
1.60 |
| IndexSelect |
float32 |
[100 50 20] |
[10] |
noncont |
2 |
bwd |
36464 |
29884 |
1.22 |
bfloat16 (backward)
| op_name |
dtype |
input |
indices |
cont |
dim |
direction |
ROCm |
MIOpen |
Improvement |
| IndexSelect |
bfloat16 |
[16 16 16] |
[5] |
cont |
0 |
bwd |
55472 |
30346 |
1.83 |
| IndexSelect |
bfloat16 |
[16 16 16] |
[5] |
noncont |
0 |
bwd |
51999 |
31857 |
1.63 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
cont |
0 |
bwd |
49888 |
24942 |
2.00 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
noncont |
0 |
bwd |
55712 |
25119 |
2.22 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
cont |
1 |
bwd |
50352 |
28515 |
1.77 |
| IndexSelect |
bfloat16 |
[16 32 32] |
[16] |
noncont |
1 |
bwd |
58975 |
27235 |
2.17 |
| IndexSelect |
bfloat16 |
[32 32 32] |
[5] |
cont |
1 |
bwd |
33392 |
24835 |
1.34 |
| IndexSelect |
bfloat16 |
[32 32 32] |
[5] |
noncont |
1 |
bwd |
41488 |
25280 |
1.64 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
cont |
0 |
bwd |
40416 |
26257 |
1.54 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
0 |
bwd |
43488 |
24373 |
1.78 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
cont |
1 |
bwd |
39376 |
26950 |
1.46 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
1 |
bwd |
51776 |
25671 |
2.02 |
| IndexSelect |
bfloat16 |
[100 50 20] |
[10] |
noncont |
2 |
bwd |
42688 |
27661 |
1.54 |