Implement RoIAlign
- Add RoIAlign operation [ref] for both forward and backward.
- Add driver and gtest.
- Performance condition:
- For
RoIAlignForward, MIOpen performs better if input and output tensors are contiguous.- ROCm pytorch's
RoIAligndoesn't support forbfloat16yet, so this is an always winning case.
- ROCm pytorch's
- For
RoIAlignBackward, MIOpen performs better if:- Input tensor is
float16type - (Input tensors are contiguous) or (Input tensors are noncontiguous but
input_numelsize is smaller)
- Input tensor is
- For
Average improvement over ROCm
| type | fwd | bwd |
|---|---|---|
| float | 6.24 | 0.97 |
| float16 | 6.32 | 1.21 |
Detail Benchmark
fp32_fwd
| input_shape | rois_shape | output_shape | is_contiguous | spatial_scale | sampling_ratio | ROCm | MIOpen | improvement |
|---|---|---|---|---|---|---|---|---|
| [16 256 160 160] | [6 5] | [7 7] | TRUE | 0.0625 | -1 | 838074 | 11022 | 76.04 |
| [6 256 200 320] | [6 5] | [7 7] | TRUE | 0.0625 | -1 | 738074 | 13866 | 53.23 |
| [16 256 160 160] | [6 5] | [7 7] | TRUE | 0.25 | 2 | 847994 | 16337 | 51.91 |
| [6 256 200 320] | [6 5] | [7 7] | TRUE | 0.25 | -1 | 755995 | 33812 | 22.36 |
| [16 256 160 160] | [6 5] | [28 28] | TRUE | 0.0625 | -1 | 875194 | 66168 | 13.23 |
| [6 256 200 320] | [6 5] | [28 28] | TRUE | 0.25 | -1 | 780314 | 76088 | 10.26 |
| [16 256 160 160] | [6 5] | [28 28] | TRUE | 0.25 | 2 | 905273 | 112017 | 8.08 |
| [6 256 200 320] | [6 5] | [28 28] | TRUE | 0.0625 | 2 | 797754 | 111875 | 7.13 |
| [16 256 160 160] | [512 5] | [7 7] | TRUE | 0.0625 | -1 | 1028473 | 347748 | 2.96 |
| [6 256 200 320] | [512 5] | [7 7] | TRUE | 0.0625 | 2 | 1098552 | 575622 | 1.91 |
| [16 256 160 160] | [512 5] | [7 7] | TRUE | 1 | 2 | 1881266 | 1121320 | 1.68 |
| [1 1 64 64] | [512 5] | [7 7] | TRUE | 0.25 | -1 | 16960 | 12817 | 1.32 |
| [6 1 800 1060] | [512 5] | [7 7] | TRUE | 0.25 | 2 | 20480 | 14684 | 1.39 |
| [6 256 200 320] | [512 5] | [7 7] | TRUE | 0.25 | -1 | 1757267 | 1468540 | 1.20 |
| [1 3 96 96] | [6 5] | [28 28] | TRUE | 0.25 | -1 | 12480 | 10062 | 1.24 |
| [6 256 200 320] | [2149 5] | [7 7] | TRUE | 0.25 | 2 | 2980138 | 2481650 | 1.20 |
| [1 1 64 64] | [512 5] | [7 7] | TRUE | 0.0625 | 2 | 17760 | 14897 | 1.19 |
fp16_fwd
| input_shape | rois_shape | output_shape | is_contiguous | spatial_scale | sampling_ratio | ROCm | MIOpen | improvement |
|---|---|---|---|---|---|---|---|---|
| [16 256 160 160] | [6 5] | [7 7] | contiguous | 0.0625 | -1 | 773275 | 10631 | 72.74 |
| [6 256 200 320] | [6 5] | [7 7] | contiguous | 0.0625 | -1 | 687035 | 12906 | 53.23 |
| [16 256 160 160] | [6 5] | [7 7] | contiguous | 0.0625 | 2 | 775515 | 14950 | 51.87 |
| [16 256 160 160] | [6 5] | [7 7] | contiguous | 0.25 | 2 | 784474 | 15430 | 50.84 |
| [6 256 200 320] | [6 5] | [7 7] | contiguous | 0.0625 | 2 | 691834 | 14577 | 47.46 |
| [16 256 160 160] | [6 5] | [7 7] | contiguous | 1 | 2 | 777275 | 18133 | 42.87 |
| [6 256 200 320] | [6 5] | [7 7] | contiguous | 0.25 | 2 | 692795 | 16248 | 42.64 |
| [16 256 160 160] | [6 5] | [7 7] | contiguous | 0.25 | -1 | 779514 | 21262 | 36.66 |
| [6 256 200 320] | [6 5] | [7 7] | contiguous | 1 | 2 | 704475 | 21741 | 32.40 |
| [6 256 200 320] | [6 5] | [7 7] | contiguous | 0.25 | -1 | 699675 | 32070 | 21.82 |
| [16 256 160 160] | [6 5] | [28 28] | contiguous | 0.0625 | -1 | 805754 | 65155 | 12.37 |
| [16 256 160 160] | [6 5] | [28 28] | contiguous | 0.25 | -1 | 806394 | 65386 | 12.33 |
fp16_bwd
| input_size | rois_size | output_size | contiguous | spatial_scale | sampling_ratio | ROCm | MIOpen | Improvement |
|---|---|---|---|---|---|---|---|---|
| [3 1 800 1060] | [512 5] | [28 28] | TRUE | 0.25 | 2 | 3776773 | 1101540 | 3.43 |
| [1 1 800 1201] | [512 5] | [28 28] | TRUE | 0.25 | 2 | 4223650 | 1260380 | 3.35 |
| [2 1 800 1060] | [512 5] | [28 28] | TRUE | 0.25 | 2 | 2998538 | 1017500 | 2.95 |
| [1 1 800 1201] | [6 5] | [28 28] | TRUE | 0.0625 | -1 | 1359510 | 475799 | 2.86 |
| [3 1 800 1060] | [512 5] | [28 28] | TRUE | 1 | 2 | 1239991 | 446023 | 2.78 |
| [1 1 64 64] | [6 5] | [28 28] | FALSE | 0.0625 | -1 | 5180762 | 1944090 | 2.66 |
| [1 1 64 64] | [6 5] | [28 28] | TRUE | 0.0625 | -1 | 4979164 | 1887840 | 2.64 |
| [1 1 64 64] | [6 5] | [28 28] | FALSE | 0.25 | 2 | 4175810 | 1677920 | 2.49 |
| [1 1 800 1201] | [512 5] | [28 28] | TRUE | 1 | 2 | 1037592 | 455659 | 2.28 |
| [6 1 800 1060] | [2149 5] | [28 28] | TRUE | 0.25 | 2 | 7006829 | 3170180 | 2.21 |
| [6 256 200 320] | [512 5] | [28 28] | TRUE | 1 | 2 | 113196296 | 68001300 | 1.66 |
| [6 1 800 1060] | [512 5] | [28 28] | TRUE | 0.25 | 2 | 1637108 | 993130 | 1.65 |
| [1 3 96 96] | [512 5] | [28 28] | FALSE | 0.25 | -1 | 11071280 | 6859840 | 1.61 |
| [2 1 800 1060] | [512 5] | [28 28] | TRUE | 1 | 2 | 689115 | 427890 | 1.61 |
| [1 1 64 64] | [2149 5] | [28 28] | TRUE | 0.25 | -1 | 38307082 | 23917300 | 1.60 |
| [1 3 96 96] | [2149 5] | [28 28] | TRUE | 0.0625 | -1 | 39848670 | 25024400 | 1.59 |
| [1 1 64 64] | [2149 5] | [28 28] | FALSE | 0.0625 | 2 | 14476410 | 10476100 | 1.38 |
| [1 3 96 96] | [512 5] | [7 7] | FALSE | 1 | 2 | 1018780 | 743758 | 1.37 |
| [6 1 800 1060] | [2149 5] | [28 28] | TRUE | 0.0625 | -1 | 4765470 | 3529800 | 1.35 |
| [1 3 96 96] | [512 5] | [28 28] | TRUE | 0.0625 | -1 | 5826490 | 4406120 | 1.32 |
| [1 1 64 64] | [512 5] | [28 28] | FALSE | 0.25 | -1 | 3054872 | 2344600 | 1.30 |
@anhskrttt Do we need a changelog entry for this addition?
Do we need a changelog entry for this addition?
@amd-jnovotny I actually couldn't find any MIOpen documentation specifying which types of additions require a changelog entry...? I noticed that similar ops, such as PReLU and GLU, have been included in the changelog here. Based on that, I think this op could also be added.
cc @long10024070 Do you see any issues with adding this op to the changelog?
@anhskrttt Do we need a changelog entry for this addition?
@amd-jnovotny I think editing changelog should be totally handled from AMD side. It is not our decision.
@long10024070 thanks for the feedback. @BrianHarrisonAMD Should we add an entry to the MIOpen changelog? Maybe as part of a separate PR? If you can add something, I can review it.
MIOpen is moving to the new monorepo setup and all older unmerged PR's are being closed. Please re-open this as part of the new repo if these changes are still needed.