MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Gather

Open cognaiger9 opened this issue 10 months ago • 0 comments

  • Add Gather operation with forward kernel.
  • Add driver and gtest for kernel.

Average improvement over ROCm

type bwd
float16 1.39
float 1.38
bfloat16 1.38

Detail Benchmark

float16
op_name dtype input size indices size contiguous dim direction ROCm MIOpen MIOpen vs ROCm
Gather float16 [512 85742] [512 1] contiguous 1 fwd 20144 14009 1.44
Gather float16 [4096 9192] [4096 1] contiguous 1 fwd 15248 11396 1.34
Gather float16 [4096 9192] [4096 1] noncontiguous 1 fwd 14576 10880 1.34
Gather float16 [9192 18384] [9192 1] contiguous 1 fwd 19248 11662 1.65
Gather float16 [9192 18384] [9192 1] noncontiguous 1 fwd 16224 10773 1.51
Gather float16 [2048 2048] [2048 1] noncontiguous 1 fwd 12417 10472 1.19
Gather float16 [512 1024] [512 2] contiguous 1 fwd 12658 11200 1.13
Gather float16 [512 4096] [512 4] noncontiguous 1 fwd 14578 10969 1.33
Gather float16 [1024 1024] [1024 16] contiguous 1 fwd 13314 9120 1.46
Gather float16 [1024 1024] [1024 16] noncontiguous 1 fwd 15314 10632 1.44
Gather float16 [4096 9192] [4096 4] contiguous 1 fwd 16178 11432 1.42
Gather float16 [4096 9192] [4096 4] noncontiguous 1 fwd 15922 11094 1.44
float32
op_name dtype input size indices size contiguous dim direction ROCm MIOpen MIOpen vs ROCm
Gather float32 [4096 9192] [4096 1] contiguous 1 fwd 18160 10933 1.66
Gather float32 [4096 9192] [4096 1] noncontiguous 1 fwd 15200 11467 1.33
Gather float32 [9192 18384] [9192 1] contiguous 1 fwd 23376 11644 2.01
Gather float32 [9192 18384] [9192 1] noncontiguous 1 fwd 14688 10844 1.35
Gather float32 [256 512] [256 16] contiguous 0 fwd 12097 9672 1.25
Gather float32 [2048 2048] [2048 1] noncontiguous 1 fwd 12897 10454 1.23
Gather float32 [512 1024] [512 2] contiguous 1 fwd 14065 11058 1.27
Gather float32 [512 1024] [512 2] noncontiguous 1 fwd 12850 10792 1.19
Gather float32 [512 4096] [512 4] noncontiguous 1 fwd 12193 10151 1.20
Gather float32 [1024 1024] [1024 16] contiguous 1 fwd 13234 10027 1.32
Gather float32 [1024 1024] [1024 16] noncontiguous 1 fwd 15282 10720 1.43
Gather float32 [4096 9192] [4096 4] contiguous 1 fwd 15171 11378 1.33
Gather float32 [4096 9192] [4096 4] noncontiguous 1 fwd 16402 11094 1.48
bfloat16
op_name dtype input size indices size contiguous dim direction ROCm MIOpen MIOpen vs ROCm
Gather bfloat16 [1024 4096] [1024 1] noncontiguous 1 fwd 13152 10827 1.21
Gather bfloat16 [4096 9192] [4096 1] contiguous 1 fwd 16112 11342 1.42
Gather bfloat16 [4096 9192] [4096 1] noncontiguous 1 fwd 15152 11413 1.33
Gather bfloat16 [9192 18384] [9192 1] contiguous 1 fwd 19184 11751 1.63
Gather bfloat16 [9192 18384] [9192 1] noncontiguous 1 fwd 15808 11022 1.43
Gather bfloat16 [18384 18384] [18384 1] contiguous 1 fwd 21216 15893 1.33
Gather bfloat16 [18384 18384] [18384 1] noncontiguous 1 fwd 16000 11146 1.44
Gather bfloat16 [512 4096] [512 4] noncontiguous 1 fwd 15506 10987 1.41
Gather bfloat16 [1024 1024] [1024 16] contiguous 1 fwd 11634 9423 1.23
Gather bfloat16 [1024 1024] [1024 16] noncontiguous 1 fwd 15201 10596 1.43
Gather bfloat16 [4096 9192] [4096 4] contiguous 1 fwd 15154 11396 1.33
Gather bfloat16 [4096 9192] [4096 4] noncontiguous 1 fwd 15634 11076 1.41

cognaiger9 avatar Feb 19 '25 03:02 cognaiger9