MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Impl SmoothL1Loss

Open long10024070 opened this issue 1 year ago • 0 comments

  • Added SmoothL1Loss forward and backward.
  • Added driver test and gtest for both direction of SmoothL1Loss.
  • New API is guarded by MIOPEN_BETA_API macro.
  • Compared to ROCm pytorch:
    • Backward propagation only show performance improvement over ROCm when all tensor are contiguous.
float16
op_name dtype size contiguous reduction model beta direction ROCm pytorch MIOpen HIP Improvement
SmoothL1Loss float16 [7 4] true sum ssd/ssdlite 1 fwd 14520 8173 1.78
SmoothL1Loss float16 [27 4] true sum ssd/ssdlite 1 fwd 14145 8263 1.71
SmoothL1Loss float16 [41 4] true sum ssd/ssdlite 1 fwd 14088 8497 1.66
SmoothL1Loss float16 [62 4] true sum ssd/ssdlite 1 fwd 14189 8565 1.66
SmoothL1Loss float16 [3 4] true sum ssd/ssdlite 1 fwd 13449 8140 1.65
SmoothL1Loss float16 [20 4] false sum ssd/ssdlite 1 fwd 15286 8276 1.85
SmoothL1Loss float16 [3 4] false sum ssd/ssdlite 1 fwd 13516 8134 1.66
SmoothL1Loss float16 [34 4] false sum ssd/ssdlite 1 fwd 13654 8541 1.60
SmoothL1Loss float16 [18 4] false sum ssd/ssdlite 1 fwd 13049 8199 1.59
SmoothL1Loss float16 [22 4] false sum ssd/ssdlite 1 fwd 13062 8218 1.59
SmoothL1Loss float16 [155 4] false sum ssdlite 1 bwd 12649 7866 1.61
SmoothL1Loss float16 [163 4] false sum ssd/ssdlite 1 bwd 11184 8011 1.40
SmoothL1Loss float16 [129 4] false sum ssd/ssdlite 1 bwd 10881 7839 1.39
SmoothL1Loss float16 [98 4] false sum ssdlite 1 bwd 10078 7762 1.30
SmoothL1Loss float16 [108 4] false sum ssd/ssdlite 1 bwd 10073 7789 1.29
float32
op_name dtype size contiguous reduction model beta direction ROCm pytorch MIOpen HIP Improvement
SmoothL1Loss float32 [20 4] true sum ssd/ssdlite 1 fwd 17193 8389 2.05
SmoothL1Loss float32 [7 4] true sum ssd/ssdlite 1 fwd 15565 8129 1.91
SmoothL1Loss float32 [3 4] true sum ssd/ssdlite 1 fwd 13710 8102 1.69
SmoothL1Loss float32 [47 4] true sum ssd/ssdlite 1 fwd 14861 8785 1.69
SmoothL1Loss float32 [34 4] true sum ssd/ssdlite 1 fwd 14504 8668 1.67
SmoothL1Loss float32 [3 4] false sum ssd/ssdlite 1 fwd 13745 8154 1.69
SmoothL1Loss float32 [34 4] false sum ssd/ssdlite 1 fwd 13998 8670 1.61
SmoothL1Loss float32 [22 4] false sum ssd/ssdlite 1 fwd 13561 8424 1.61
SmoothL1Loss float32 [30 4] false sum ssd/ssdlite 1 fwd 13558 8423 1.61
SmoothL1Loss float32 [20 4] false sum ssd/ssdlite 1 fwd 13561 8435 1.61
SmoothL1Loss float32 [104 4] false sum ssd/ssdlite 1 bwd 12889 8029 1.61
SmoothL1Loss float32 [129 4] false sum ssd/ssdlite 1 bwd 12197 8120 1.50
SmoothL1Loss float32 [131 4] false sum ssd/ssdlite 1 bwd 11667 8111 1.44
SmoothL1Loss float32 [137 4] false sum ssdlite 1 bwd 11638 8132 1.43
SmoothL1Loss float32 [155 4] false sum ssdlite 1 bwd 12251 8569 1.43

For the bfloat16 datatype, ROCm pytorch SmoothL1Loss operator doesn't support this datatype.

  • Average over all cases:
type average
float16 1.48
float32 1.63
bfloat16 -

long10024070 avatar Oct 31 '24 08:10 long10024070