MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement GatherV2

Open cognaiger9 opened this issue 11 months ago • 0 comments

  • Detail of operation (tensorflow)
  • Add GatherV2 operation with non-batched and batched backward kernels.
  • Add driver and gtest for kernels.

Average improvement over ROCm

type bwd
float16 4.65
float 5.59
bfloat16 5.16

Detail Benchmark

float16
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 float16 [2 2 4 6 8] [16 16] bwd 570977 244854 2.33
GatherV2 1 0 float16 [2 2 4 6 8] [16 16] bwd 452338 252338 1.79
GatherV2 2 0 float16 [2 2 4 6 8] [16 16] bwd 341809 105227 3.25
GatherV2 3 0 float16 [2 2 4 6 8] [16 16] bwd 387715 81778 4.74
GatherV2 4 0 float16 [2 2 4 6 8] [16 16] bwd 417489 84355 4.95
GatherV2 0 0 float16 [2 4 8 32 64] [8 8] bwd 411457 146649 2.81
GatherV2 1 0 float16 [2 4 8 32 64] [8 8] bwd 346176 112445 3.08
GatherV2 2 0 float16 [2 4 8 32 64] [8 8] bwd 499682 114063 4.38
GatherV2 0 0 float16 [2 4 8 32 64] [16 64] bwd 3129323 1710280 1.83
GatherV2 1 0 float16 [2 4 8 32 64] [16 64] bwd 1854078 1284180 1.44
GatherV2 4 0 float16 [2 4 8 32 64] [16 64] bwd 2486584 382152 6.51
GatherV2 0 0 float16 [4 16 32 64 64] [8 16] bwd 9583599 5384230 1.78
GatherV2 1 0 float16 [4 16 32 64 64] [8 16] bwd 2583113 1443680 1.79
GatherV2 2 0 float16 [4 16 32 64 64] [8 16] bwd 1550325 1038899 1.49
GatherV2 3 0 float16 [4 16 32 64 64] [8 16] bwd 1307337 633816 2.06
GatherV2 0 0 float16 [16 16 32 64 128] [16 32] bwd 51681692 43223100 1.20
GatherV2 1 0 float16 [16 16 32 64 128] [16 32] bwd 50925790 45607700 1.12
GatherV2 1 1 float16 [2 4 8 32 64] [2 8] bwd 287308 45692 6.29
GatherV2 2 2 float16 [2 4 8 32 64] [2 4] bwd 261031 32002 8.16
GatherV2 4 2 float16 [2 4 8 32 64] [2 4] bwd 321297 28944 11.10
GatherV2 1 1 float16 [4 16 32 64 64] [4 16] bwd 2953109 247983 11.91
GatherV2 2 2 float16 [4 16 32 64 64] [4 16] bwd 488315 66778 7.31
GatherV2 4 2 float16 [4 16 32 64 64] [4 16] bwd 645411 101625 6.35
float32
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 float32 [2 2 4 6 8] [16 16] bwd 412338 74560 5.53
GatherV2 1 0 float32 [2 2 4 6 8] [16 16] bwd 362641 81280 4.46
GatherV2 2 0 float32 [2 2 4 6 8] [16 16] bwd 355501 43715 8.13
GatherV2 3 0 float32 [2 2 4 6 8] [16 16] bwd 352977 43217 8.17
GatherV2 4 0 float32 [2 2 4 6 8] [16 16] bwd 441666 49848 8.86
GatherV2 0 0 float32 [2 4 8 32 64] [8 8] bwd 336881 79929 4.21
GatherV2 1 0 float32 [2 4 8 32 64] [8 8] bwd 388658 58702 6.62
GatherV2 2 0 float32 [2 4 8 32 64] [8 8] bwd 361425 56569 6.39
GatherV2 0 0 float32 [2 4 8 32 64] [16 64] bwd 2090839 905531 2.31
GatherV2 1 0 float32 [2 4 8 32 64] [16 64] bwd 1279412 551184 2.32
GatherV2 2 0 float32 [2 4 8 32 64] [16 64] bwd 1098788 657868 1.67
GatherV2 3 0 float32 [2 4 8 32 64] [16 64] bwd 696770 275432 2.53
GatherV2 0 0 float32 [4 16 32 64 64] [8 16] bwd 7004803 4010500 1.75
GatherV2 1 0 float32 [4 16 32 64 64] [8 16] bwd 2017621 1158890 1.74
GatherV2 2 0 float32 [4 16 32 64 64] [8 16] bwd 1344719 624252 2.15
GatherV2 3 0 float32 [4 16 32 64 64] [8 16] bwd 1471460 357056 4.12
GatherV2 0 0 float32 [16 16 32 64 128] [16 32] bwd 37657097 32223999 1.17
GatherV2 3 0 float32 [16 16 32 64 128] [16 32] bwd 15822371 11934400 1.33
GatherV2 1 1 float32 [2 2 4 6 8] [2 4] bwd 302031 30544 9.89
GatherV2 2 2 float32 [2 2 4 6 8] [2 2] bwd 266457 34598 7.70
GatherV2 4 2 float32 [2 2 4 6 8] [2 2] bwd 278411 32678 8.52
GatherV2 1 1 float32 [2 4 8 32 64] [2 8] bwd 269914 28997 9.31
GatherV2 2 2 float32 [2 4 8 32 64] [2 4] bwd 301791 40998 7.36
GatherV2 4 2 float32 [2 4 8 32 64] [2 4] bwd 245494 30153 8.14
GatherV2 1 1 float32 [4 16 32 64 64] [4 16] bwd 2256154 222577 10.14
GatherV2 2 2 float32 [4 16 32 64 64] [4 16] bwd 562390 66049 8.51
GatherV2 4 2 float32 [4 16 32 64 64] [4 16] bwd 912272 114960 7.94
bfloat16
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 bfloat16 [2 2 4 6 8] [16 16] bwd 594514 236658 2.51
GatherV2 1 0 bfloat16 [2 2 4 6 8] [16 16] bwd 472834 248161 1.91
GatherV2 2 0 bfloat16 [2 2 4 6 8] [16 16] bwd 366961 111147 3.30
GatherV2 3 0 bfloat16 [2 2 4 6 8] [16 16] bwd 420721 87947 4.78
GatherV2 4 0 bfloat16 [2 2 4 6 8] [16 16] bwd 493618 82115 6.01
GatherV2 0 0 bfloat16 [2 4 8 32 64] [8 8] bwd 416145 130294 3.19
GatherV2 1 0 bfloat16 [2 4 8 32 64] [8 8] bwd 445201 117511 3.79
GatherV2 2 0 bfloat16 [2 4 8 32 64] [8 8] bwd 430690 107502 4.01
GatherV2 3 0 bfloat16 [2 4 8 32 64] [8 8] bwd 358578 38897 9.22
GatherV2 0 0 bfloat16 [2 4 8 32 64] [16 64] bwd 3190058 1470390 2.17
GatherV2 1 0 bfloat16 [2 4 8 32 64] [16 64] bwd 1907398 1150150 1.66
GatherV2 4 0 bfloat16 [2 4 8 32 64] [16 64] bwd 2475236 365103 6.78
GatherV2 0 0 bfloat16 [4 16 32 64 64] [8 16] bwd 9823571 5387530 1.82
GatherV2 1 0 bfloat16 [4 16 32 64 64] [8 16] bwd 2671833 1452310 1.84
GatherV2 2 0 bfloat16 [4 16 32 64 64] [8 16] bwd 1596746 1028559 1.55
GatherV2 3 0 bfloat16 [4 16 32 64 64] [8 16] bwd 1274308 631826 2.02
GatherV2 0 0 bfloat16 [16 16 32 64 128] [16 32] bwd 52973827 43275600 1.22
GatherV2 1 0 bfloat16 [16 16 32 64 128] [16 32] bwd 52040938 45934000 1.13
GatherV2 1 1 bfloat16 [2 2 4 6 8] [2 4] bwd 305423 30651 9.96
GatherV2 2 2 bfloat16 [2 2 4 6 8] [2 2] bwd 264632 27735 9.54
GatherV2 4 2 bfloat16 [2 2 4 6 8] [2 2] bwd 265897 27095 9.81
GatherV2 1 1 bfloat16 [2 4 8 32 64] [2 8] bwd 298350 30971 9.63
GatherV2 4 2 bfloat16 [2 4 8 32 64] [2 4] bwd 263545 28339 9.30
GatherV2 1 1 bfloat16 [4 16 32 64 64] [4 16] bwd 3029180 251201 12.06
GatherV2 2 2 bfloat16 [4 16 32 64 64] [4 16] bwd 510302 66333 7.69
GatherV2 4 2 bfloat16 [4 16 32 64 64] [4 16] bwd 739456 101572 7.28

cognaiger9 avatar Feb 11 '25 03:02 cognaiger9