MIOpen
MIOpen copied to clipboard
Impl SmoothL1Loss
- Added SmoothL1Loss forward and backward.
- Added driver test and gtest for both direction of SmoothL1Loss.
- New API is guarded by MIOPEN_BETA_API macro.
- Compared to ROCm pytorch:
- Backward propagation only show performance improvement over ROCm when all tensor are contiguous.
float16
| op_name | dtype | size | contiguous | reduction | model | beta | direction | ROCm pytorch | MIOpen HIP | Improvement |
|---|---|---|---|---|---|---|---|---|---|---|
| SmoothL1Loss | float16 | [7 4] | true | sum | ssd/ssdlite | 1 | fwd | 14520 | 8173 | 1.78 |
| SmoothL1Loss | float16 | [27 4] | true | sum | ssd/ssdlite | 1 | fwd | 14145 | 8263 | 1.71 |
| SmoothL1Loss | float16 | [41 4] | true | sum | ssd/ssdlite | 1 | fwd | 14088 | 8497 | 1.66 |
| SmoothL1Loss | float16 | [62 4] | true | sum | ssd/ssdlite | 1 | fwd | 14189 | 8565 | 1.66 |
| SmoothL1Loss | float16 | [3 4] | true | sum | ssd/ssdlite | 1 | fwd | 13449 | 8140 | 1.65 |
| SmoothL1Loss | float16 | [20 4] | false | sum | ssd/ssdlite | 1 | fwd | 15286 | 8276 | 1.85 |
| SmoothL1Loss | float16 | [3 4] | false | sum | ssd/ssdlite | 1 | fwd | 13516 | 8134 | 1.66 |
| SmoothL1Loss | float16 | [34 4] | false | sum | ssd/ssdlite | 1 | fwd | 13654 | 8541 | 1.60 |
| SmoothL1Loss | float16 | [18 4] | false | sum | ssd/ssdlite | 1 | fwd | 13049 | 8199 | 1.59 |
| SmoothL1Loss | float16 | [22 4] | false | sum | ssd/ssdlite | 1 | fwd | 13062 | 8218 | 1.59 |
| SmoothL1Loss | float16 | [155 4] | false | sum | ssdlite | 1 | bwd | 12649 | 7866 | 1.61 |
| SmoothL1Loss | float16 | [163 4] | false | sum | ssd/ssdlite | 1 | bwd | 11184 | 8011 | 1.40 |
| SmoothL1Loss | float16 | [129 4] | false | sum | ssd/ssdlite | 1 | bwd | 10881 | 7839 | 1.39 |
| SmoothL1Loss | float16 | [98 4] | false | sum | ssdlite | 1 | bwd | 10078 | 7762 | 1.30 |
| SmoothL1Loss | float16 | [108 4] | false | sum | ssd/ssdlite | 1 | bwd | 10073 | 7789 | 1.29 |
float32
| op_name | dtype | size | contiguous | reduction | model | beta | direction | ROCm pytorch | MIOpen HIP | Improvement |
|---|---|---|---|---|---|---|---|---|---|---|
| SmoothL1Loss | float32 | [20 4] | true | sum | ssd/ssdlite | 1 | fwd | 17193 | 8389 | 2.05 |
| SmoothL1Loss | float32 | [7 4] | true | sum | ssd/ssdlite | 1 | fwd | 15565 | 8129 | 1.91 |
| SmoothL1Loss | float32 | [3 4] | true | sum | ssd/ssdlite | 1 | fwd | 13710 | 8102 | 1.69 |
| SmoothL1Loss | float32 | [47 4] | true | sum | ssd/ssdlite | 1 | fwd | 14861 | 8785 | 1.69 |
| SmoothL1Loss | float32 | [34 4] | true | sum | ssd/ssdlite | 1 | fwd | 14504 | 8668 | 1.67 |
| SmoothL1Loss | float32 | [3 4] | false | sum | ssd/ssdlite | 1 | fwd | 13745 | 8154 | 1.69 |
| SmoothL1Loss | float32 | [34 4] | false | sum | ssd/ssdlite | 1 | fwd | 13998 | 8670 | 1.61 |
| SmoothL1Loss | float32 | [22 4] | false | sum | ssd/ssdlite | 1 | fwd | 13561 | 8424 | 1.61 |
| SmoothL1Loss | float32 | [30 4] | false | sum | ssd/ssdlite | 1 | fwd | 13558 | 8423 | 1.61 |
| SmoothL1Loss | float32 | [20 4] | false | sum | ssd/ssdlite | 1 | fwd | 13561 | 8435 | 1.61 |
| SmoothL1Loss | float32 | [104 4] | false | sum | ssd/ssdlite | 1 | bwd | 12889 | 8029 | 1.61 |
| SmoothL1Loss | float32 | [129 4] | false | sum | ssd/ssdlite | 1 | bwd | 12197 | 8120 | 1.50 |
| SmoothL1Loss | float32 | [131 4] | false | sum | ssd/ssdlite | 1 | bwd | 11667 | 8111 | 1.44 |
| SmoothL1Loss | float32 | [137 4] | false | sum | ssdlite | 1 | bwd | 11638 | 8132 | 1.43 |
| SmoothL1Loss | float32 | [155 4] | false | sum | ssdlite | 1 | bwd | 12251 | 8569 | 1.43 |
For the bfloat16 datatype, ROCm pytorch SmoothL1Loss operator doesn't support this datatype.
- Average over all cases:
| type | average |
|---|---|
| float16 | 1.48 |
| float32 | 1.63 |
| bfloat16 | - |