Implement Pad Reflection 1D
Open
anhskrttt
opened this issue 9 months ago
•
0 comments
- Add Pad Reflection 1D operation [ref] for forward and backward.
- Add driver and gtest.
- Performance condition:
- Not type
bfloat16
- MIOpen performs better if it's padding1D (i.e. input_num_dims == 2) and pad only to last dim (i.e. padding_array.size() == 2)
- For
PadReflectionBackward: Not type float16
- float32: MIOpen performs better if padding to dim that is not too large (padded_dim <= 64).
Average improvement over ROCm
| type |
fwd |
bwd |
| float |
1.48 |
1.73 |
| float16 |
1.40 |
- |
| bfloat16 |
- |
- |
Detail Benchmark
fp32_fwd
| dtype |
input_size |
contiguous |
padding |
direction |
improvement |
| float32 |
[1024 256 8] |
noncontiguous |
[3 5] |
fwd |
5.5431 |
| float32 |
[1024 64 8] |
noncontiguous |
[3 5] |
fwd |
4.3195 |
| float32 |
[256 256 8] |
noncontiguous |
[3 5] |
fwd |
4.3056 |
| float32 |
[1024 256 16] |
noncontiguous |
[3 5] |
fwd |
4.0766 |
| float32 |
[1024 64 16] |
noncontiguous |
[3 5] |
fwd |
3.5369 |
| float32 |
[256 256 16] |
noncontiguous |
[3 5] |
fwd |
3.3904 |
| float32 |
[8 2 64] |
noncontiguous |
[2 2] |
fwd |
3.1042 |
| float32 |
[16 2 64] |
noncontiguous |
[2 2] |
fwd |
2.7072 |
| float32 |
[16 8 16] |
noncontiguous |
[2 2] |
fwd |
2.5312 |
| float32 |
[16 4 64] |
noncontiguous |
[3 5] |
fwd |
2.4136 |
| float32 |
[1024 16 8] |
noncontiguous |
[3 5] |
fwd |
2.3902 |
| float32 |
[32 8 16] |
noncontiguous |
[2 2] |
fwd |
2.3676 |
| float32 |
[8 8 16] |
noncontiguous |
[2 2] |
fwd |
2.3544 |
| float32 |
[16 4 16] |
noncontiguous |
[2 2] |
fwd |
2.3374 |
| float32 |
[32 2 16] |
noncontiguous |
[2 2] |
fwd |
2.3363 |
fp16_fwd
| dtype |
input_size |
contiguous |
padding |
direction |
improvement |
| float16 |
[512 256 8] |
noncontiguous |
[3 5] |
fwd |
5.042308539 |
| float16 |
[512 256 16] |
noncontiguous |
[3 5] |
fwd |
3.793880455 |
| float16 |
[512 64 8] |
noncontiguous |
[3 5] |
fwd |
3.229970638 |
| float16 |
[128 256 8] |
noncontiguous |
[3 5] |
fwd |
3.061361755 |
| float16 |
[512 64 16] |
noncontiguous |
[3 5] |
fwd |
2.99377916 |
| float16 |
[16 4 16] |
noncontiguous |
[2 2] |
fwd |
2.407523511 |
| float16 |
[4 2 16] |
noncontiguous |
[2 2] |
fwd |
2.333333333 |
| float16 |
[32 2 16] |
noncontiguous |
[3 5] |
fwd |
2.242424242 |
| float16 |
[16 2 16] |
noncontiguous |
[2 2] |
fwd |
2.07073955 |
| float16 |
[8 4 16] |
noncontiguous |
[2 2] |
fwd |
2.067524116 |
| float16 |
[32 16 16] |
noncontiguous |
[3 5] |
fwd |
1.971631206 |
| float16 |
[512 16 16] |
noncontiguous |
[3 5] |
fwd |
1.854805726 |
| float16 |
[512 16 8] |
noncontiguous |
[3 5] |
fwd |
1.847280335 |
| float16 |
[128 64 16] |
noncontiguous |
[3 5] |
fwd |
1.825203252 |
| float16 |
[128 16 8] |
noncontiguous |
[3 5] |
fwd |
1.816377171 |
fp32_bwd
| dtype |
input_size |
contiguous |
padding |
direction |
improvement |
| float32 |
[1024 256 8] |
noncontiguous |
[3 5] |
bwd |
4.031944207 |
| float32 |
[512 256 8] |
contiguous |
[3 5] |
bwd |
3.871722166 |
| float32 |
[512 256 16] |
contiguous |
[3 5] |
bwd |
3.489330757 |
| float32 |
[1024 256 16] |
noncontiguous |
[3 5] |
bwd |
3.31064466 |
| float32 |
[256 256 8] |
noncontiguous |
[3 5] |
bwd |
2.98488121 |
| float32 |
[128 256 8] |
contiguous |
[3 5] |
bwd |
2.818461538 |
| float32 |
[512 64 8] |
contiguous |
[3 5] |
bwd |
2.809829603 |
| float32 |
[512 64 16] |
contiguous |
[3 5] |
bwd |
2.736646341 |
| float32 |
[128 256 16] |
contiguous |
[3 5] |
bwd |
2.733203505 |
| float32 |
[256 256 16] |
noncontiguous |
[3 5] |
bwd |
2.580315959 |
| float32 |
[1024 64 8] |
noncontiguous |
[3 5] |
bwd |
2.494184734 |
| float32 |
[1024 64 16] |
noncontiguous |
[3 5] |
bwd |
2.330557868 |
| float32 |
[128 64 8] |
contiguous |
[3 5] |
bwd |
1.495726496 |
| float32 |
[512 16 8] |
contiguous |
[3 5] |
bwd |
1.492063492 |
| float32 |
[32 256 8] |
contiguous |
[3 5] |
bwd |
1.475352113 |