Implement Gather
Open
cognaiger9
opened this issue 10 months ago
•
0 comments
- Add Gather operation with forward kernel.
- Add driver and gtest for kernel.
Average improvement over ROCm
| type |
bwd |
| float16 |
1.39 |
| float |
1.38 |
| bfloat16 |
1.38 |
Detail Benchmark
float16
| op_name |
dtype |
input size |
indices size |
contiguous |
dim |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| Gather |
float16 |
[512 85742] |
[512 1] |
contiguous |
1 |
fwd |
20144 |
14009 |
1.44 |
| Gather |
float16 |
[4096 9192] |
[4096 1] |
contiguous |
1 |
fwd |
15248 |
11396 |
1.34 |
| Gather |
float16 |
[4096 9192] |
[4096 1] |
noncontiguous |
1 |
fwd |
14576 |
10880 |
1.34 |
| Gather |
float16 |
[9192 18384] |
[9192 1] |
contiguous |
1 |
fwd |
19248 |
11662 |
1.65 |
| Gather |
float16 |
[9192 18384] |
[9192 1] |
noncontiguous |
1 |
fwd |
16224 |
10773 |
1.51 |
| Gather |
float16 |
[2048 2048] |
[2048 1] |
noncontiguous |
1 |
fwd |
12417 |
10472 |
1.19 |
| Gather |
float16 |
[512 1024] |
[512 2] |
contiguous |
1 |
fwd |
12658 |
11200 |
1.13 |
| Gather |
float16 |
[512 4096] |
[512 4] |
noncontiguous |
1 |
fwd |
14578 |
10969 |
1.33 |
| Gather |
float16 |
[1024 1024] |
[1024 16] |
contiguous |
1 |
fwd |
13314 |
9120 |
1.46 |
| Gather |
float16 |
[1024 1024] |
[1024 16] |
noncontiguous |
1 |
fwd |
15314 |
10632 |
1.44 |
| Gather |
float16 |
[4096 9192] |
[4096 4] |
contiguous |
1 |
fwd |
16178 |
11432 |
1.42 |
| Gather |
float16 |
[4096 9192] |
[4096 4] |
noncontiguous |
1 |
fwd |
15922 |
11094 |
1.44 |
float32
| op_name |
dtype |
input size |
indices size |
contiguous |
dim |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| Gather |
float32 |
[4096 9192] |
[4096 1] |
contiguous |
1 |
fwd |
18160 |
10933 |
1.66 |
| Gather |
float32 |
[4096 9192] |
[4096 1] |
noncontiguous |
1 |
fwd |
15200 |
11467 |
1.33 |
| Gather |
float32 |
[9192 18384] |
[9192 1] |
contiguous |
1 |
fwd |
23376 |
11644 |
2.01 |
| Gather |
float32 |
[9192 18384] |
[9192 1] |
noncontiguous |
1 |
fwd |
14688 |
10844 |
1.35 |
| Gather |
float32 |
[256 512] |
[256 16] |
contiguous |
0 |
fwd |
12097 |
9672 |
1.25 |
| Gather |
float32 |
[2048 2048] |
[2048 1] |
noncontiguous |
1 |
fwd |
12897 |
10454 |
1.23 |
| Gather |
float32 |
[512 1024] |
[512 2] |
contiguous |
1 |
fwd |
14065 |
11058 |
1.27 |
| Gather |
float32 |
[512 1024] |
[512 2] |
noncontiguous |
1 |
fwd |
12850 |
10792 |
1.19 |
| Gather |
float32 |
[512 4096] |
[512 4] |
noncontiguous |
1 |
fwd |
12193 |
10151 |
1.20 |
| Gather |
float32 |
[1024 1024] |
[1024 16] |
contiguous |
1 |
fwd |
13234 |
10027 |
1.32 |
| Gather |
float32 |
[1024 1024] |
[1024 16] |
noncontiguous |
1 |
fwd |
15282 |
10720 |
1.43 |
| Gather |
float32 |
[4096 9192] |
[4096 4] |
contiguous |
1 |
fwd |
15171 |
11378 |
1.33 |
| Gather |
float32 |
[4096 9192] |
[4096 4] |
noncontiguous |
1 |
fwd |
16402 |
11094 |
1.48 |
bfloat16
| op_name |
dtype |
input size |
indices size |
contiguous |
dim |
direction |
ROCm |
MIOpen |
MIOpen vs ROCm |
| Gather |
bfloat16 |
[1024 4096] |
[1024 1] |
noncontiguous |
1 |
fwd |
13152 |
10827 |
1.21 |
| Gather |
bfloat16 |
[4096 9192] |
[4096 1] |
contiguous |
1 |
fwd |
16112 |
11342 |
1.42 |
| Gather |
bfloat16 |
[4096 9192] |
[4096 1] |
noncontiguous |
1 |
fwd |
15152 |
11413 |
1.33 |
| Gather |
bfloat16 |
[9192 18384] |
[9192 1] |
contiguous |
1 |
fwd |
19184 |
11751 |
1.63 |
| Gather |
bfloat16 |
[9192 18384] |
[9192 1] |
noncontiguous |
1 |
fwd |
15808 |
11022 |
1.43 |
| Gather |
bfloat16 |
[18384 18384] |
[18384 1] |
contiguous |
1 |
fwd |
21216 |
15893 |
1.33 |
| Gather |
bfloat16 |
[18384 18384] |
[18384 1] |
noncontiguous |
1 |
fwd |
16000 |
11146 |
1.44 |
| Gather |
bfloat16 |
[512 4096] |
[512 4] |
noncontiguous |
1 |
fwd |
15506 |
10987 |
1.41 |
| Gather |
bfloat16 |
[1024 1024] |
[1024 16] |
contiguous |
1 |
fwd |
11634 |
9423 |
1.23 |
| Gather |
bfloat16 |
[1024 1024] |
[1024 16] |
noncontiguous |
1 |
fwd |
15201 |
10596 |
1.43 |
| Gather |
bfloat16 |
[4096 9192] |
[4096 4] |
contiguous |
1 |
fwd |
15154 |
11396 |
1.33 |
| Gather |
bfloat16 |
[4096 9192] |
[4096 4] |
noncontiguous |
1 |
fwd |
15634 |
11076 |
1.41 |