MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement Pad Reflection 1D

Open anhskrttt opened this issue 9 months ago • 0 comments

  • Add Pad Reflection 1D operation [ref] for forward and backward.
  • Add driver and gtest.
  • Performance condition:
    • Not type bfloat16
    • MIOpen performs better if it's padding1D (i.e. input_num_dims == 2) and pad only to last dim (i.e. padding_array.size() == 2)
    • For PadReflectionBackward: Not type float16
      • float32: MIOpen performs better if padding to dim that is not too large (padded_dim <= 64).

Average improvement over ROCm

type fwd bwd
float 1.48 1.73
float16 1.40 -
bfloat16 - -

Detail Benchmark

fp32_fwd
dtype input_size contiguous padding direction improvement
float32 [1024 256 8] noncontiguous [3 5] fwd 5.5431
float32 [1024 64 8] noncontiguous [3 5] fwd 4.3195
float32 [256 256 8] noncontiguous [3 5] fwd 4.3056
float32 [1024 256 16] noncontiguous [3 5] fwd 4.0766
float32 [1024 64 16] noncontiguous [3 5] fwd 3.5369
float32 [256 256 16] noncontiguous [3 5] fwd 3.3904
float32 [8 2 64] noncontiguous [2 2] fwd 3.1042
float32 [16 2 64] noncontiguous [2 2] fwd 2.7072
float32 [16 8 16] noncontiguous [2 2] fwd 2.5312
float32 [16 4 64] noncontiguous [3 5] fwd 2.4136
float32 [1024 16 8] noncontiguous [3 5] fwd 2.3902
float32 [32 8 16] noncontiguous [2 2] fwd 2.3676
float32 [8 8 16] noncontiguous [2 2] fwd 2.3544
float32 [16 4 16] noncontiguous [2 2] fwd 2.3374
float32 [32 2 16] noncontiguous [2 2] fwd 2.3363
fp16_fwd
dtype input_size contiguous padding direction improvement
float16 [512 256 8] noncontiguous [3 5] fwd 5.042308539
float16 [512 256 16] noncontiguous [3 5] fwd 3.793880455
float16 [512 64 8] noncontiguous [3 5] fwd 3.229970638
float16 [128 256 8] noncontiguous [3 5] fwd 3.061361755
float16 [512 64 16] noncontiguous [3 5] fwd 2.99377916
float16 [16 4 16] noncontiguous [2 2] fwd 2.407523511
float16 [4 2 16] noncontiguous [2 2] fwd 2.333333333
float16 [32 2 16] noncontiguous [3 5] fwd 2.242424242
float16 [16 2 16] noncontiguous [2 2] fwd 2.07073955
float16 [8 4 16] noncontiguous [2 2] fwd 2.067524116
float16 [32 16 16] noncontiguous [3 5] fwd 1.971631206
float16 [512 16 16] noncontiguous [3 5] fwd 1.854805726
float16 [512 16 8] noncontiguous [3 5] fwd 1.847280335
float16 [128 64 16] noncontiguous [3 5] fwd 1.825203252
float16 [128 16 8] noncontiguous [3 5] fwd 1.816377171
fp32_bwd
dtype input_size contiguous padding direction improvement
float32 [1024 256 8] noncontiguous [3 5] bwd 4.031944207
float32 [512 256 8] contiguous [3 5] bwd 3.871722166
float32 [512 256 16] contiguous [3 5] bwd 3.489330757
float32 [1024 256 16] noncontiguous [3 5] bwd 3.31064466
float32 [256 256 8] noncontiguous [3 5] bwd 2.98488121
float32 [128 256 8] contiguous [3 5] bwd 2.818461538
float32 [512 64 8] contiguous [3 5] bwd 2.809829603
float32 [512 64 16] contiguous [3 5] bwd 2.736646341
float32 [128 256 16] contiguous [3 5] bwd 2.733203505
float32 [256 256 16] noncontiguous [3 5] bwd 2.580315959
float32 [1024 64 8] noncontiguous [3 5] bwd 2.494184734
float32 [1024 64 16] noncontiguous [3 5] bwd 2.330557868
float32 [128 64 8] contiguous [3 5] bwd 1.495726496
float32 [512 16 8] contiguous [3 5] bwd 1.492063492
float32 [32 256 8] contiguous [3 5] bwd 1.475352113

anhskrttt avatar Mar 17 '25 03:03 anhskrttt