MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Pdist Backward

Open anhskrttt opened this issue 10 months ago • 0 comments

  • Add Pdist operation with backward kernels.
  • Add driver and gtest for kernels.

Average improvement over ROCm

type bwd
float 1.65
float16 -
bfloat16 -
  • ROCm pytorch doesn't support for float16 and bfloat16 , so these are always winning cases.

Detail Benchmark

fp32
dtype size is_contiguous ROCm MIOpen Improvement
float32 [2 5] noncontiguous 191678 33119 5.787554
float32 [16 255] noncontiguous 178239 32266 5.524050
float32 [25 100] noncontiguous 157438 33741 4.666074
float32 [5 10] contiguous 104480 22577 4.627718
float32 [25 300] noncontiguous 151519 33333 4.545615
float32 [2000 3000] contiguous 291086060 92748000 3.138462
float32 [10 65536] noncontiguous 328958 109723 2.998077
float32 [100 10] noncontiguous 123679 45119 2.741173
float32 [100 20] contiguous 96960 43039 2.252840
float32 [800 2329] contiguous 43446243 19770900 2.197484
float32 [800 498] contiguous 8592417 3999580 2.148330
float32 [1044 80] contiguous 1641588 1301320 1.261479
float32 [5199 80] contiguous 37923721 30627700 1.238216
float32 [4918 80] contiguous 33406313 29107700 1.147680
float32 [4558 80] contiguous 28701388 25077400 1.144512
float32 [993 80] noncontiguous 1499829 1326410 1.130743
float32 [2048 512] noncontiguous 83486114 74272700 1.124048
float32 [128 128] noncontiguous 134079 120408 1.113539
float32 [237 80] contiguous 164639 148390 1.109502
float32 [4061 80] noncontiguous 22037119 20823500 1.058281
float32 [258 80] contiguous 172159 162719 1.058014
float32 [4077 80] noncontiguous 22334398 21159900 1.055506
float32 [4050 80] noncontiguous 21789601 20755800 1.049808
float32 [4508 80] noncontiguous 28472270 27136500 1.049224
float32 [285 80] contiguous 186239 184105 1.011591

anhskrttt avatar Feb 14 '25 07:02 anhskrttt