Implement L1Loss
Open
cognaiger9
opened this issue 1 year ago
•
0 comments
- Add L1Loss operation with forward reduced kernels.
- Add driver and gtest for kernels.
- MIOpen performs better if:
- Reduction mode is either sum or mean
Average improvement over ROCm
| type |
fwd |
| float16 |
1.92 |
| float |
1.93 |
| bfloat16 |
1.9 |
Detail Benchmark
float16
| op_name |
dtype |
size |
contiguous |
reduction |
direction |
ROCm |
MIOpen |
Improvement |
| L1Loss |
float16 |
[7 4] |
contiguous |
sum |
fwd |
124259 |
44677 |
2,78 |
| L1Loss |
float16 |
[7 4] |
noncontiguous |
sum |
fwd |
145427 |
45850 |
3,17 |
| L1Loss |
float16 |
[18 4] |
contiguous |
sum |
fwd |
72674 |
42117 |
1,73 |
| L1Loss |
float16 |
[28 4] |
contiguous |
sum |
fwd |
120067 |
45157 |
2,66 |
| L1Loss |
float16 |
[28 4] |
noncontiguous |
sum |
fwd |
59538 |
44730 |
1,33 |
| L1Loss |
float16 |
[34 4] |
noncontiguous |
sum |
fwd |
105651 |
47433 |
2,23 |
| L1Loss |
float16 |
[54 4] |
contiguous |
sum |
fwd |
64209 |
40695 |
1,58 |
| L1Loss |
float16 |
[72 4] |
contiguous |
sum |
fwd |
108066 |
43752 |
2,47 |
| L1Loss |
float16 |
[72 4] |
noncontiguous |
sum |
fwd |
50754 |
43059 |
1,18 |
| L1Loss |
float16 |
[98 4] |
noncontiguous |
sum |
fwd |
123586 |
42455 |
2,91 |
| L1Loss |
float16 |
[106 4] |
contiguous |
sum |
fwd |
56545 |
43325 |
1,31 |
| L1Loss |
float16 |
[135 4] |
contiguous |
sum |
fwd |
119331 |
45050 |
2,65 |
| L1Loss |
float16 |
[190 4] |
noncontiguous |
sum |
fwd |
111459 |
52446 |
2,13 |
| L1Loss |
float16 |
[249 128] |
contiguous |
sum |
fwd |
100514 |
54828 |
1,83 |
| L1Loss |
float16 |
[349 222] |
contiguous |
sum |
fwd |
58818 |
44392 |
1,32 |
| L1Loss |
float16 |
[349 222] |
noncontiguous |
sum |
fwd |
77970 |
45352 |
1,72 |
| L1Loss |
float16 |
[451 128] |
contiguous |
sum |
fwd |
58737 |
50312 |
1,17 |
| L1Loss |
float16 |
[451 128] |
noncontiguous |
sum |
fwd |
62626 |
45352 |
1,38 |
| L1Loss |
float16 |
[603 546] |
contiguous |
sum |
fwd |
75186 |
46934 |
1,60 |
| L1Loss |
float16 |
[603 546] |
noncontiguous |
sum |
fwd |
75698 |
57193 |
1,32 |
float32
| op_name |
dtype |
size |
contiguous |
reduction |
direction |
ROCm |
MIOpen |
Improvement |
| L1Loss |
float32 |
[7 4] |
contiguous |
sum |
fwd |
81298 |
51255 |
1,59 |
| L1Loss |
float32 |
[7 4] |
noncontiguous |
sum |
fwd |
57249 |
44713 |
1,28 |
| L1Loss |
float32 |
[18 4] |
contiguous |
sum |
fwd |
104194 |
45122 |
2,31 |
| L1Loss |
float32 |
[28 4] |
contiguous |
sum |
fwd |
55697 |
46224 |
1,20 |
| L1Loss |
float32 |
[28 4] |
noncontiguous |
sum |
fwd |
118723 |
44161 |
2,69 |
| L1Loss |
float32 |
[34 4] |
noncontiguous |
sum |
fwd |
58033 |
46650 |
1,24 |
| L1Loss |
float32 |
[54 4] |
contiguous |
sum |
fwd |
123811 |
44001 |
2,81 |
| L1Loss |
float32 |
[72 4] |
contiguous |
sum |
fwd |
60945 |
43308 |
1,41 |
| L1Loss |
float32 |
[72 4] |
noncontiguous |
sum |
fwd |
113218 |
43735 |
2,59 |
| L1Loss |
float32 |
[98 4] |
noncontiguous |
sum |
fwd |
73282 |
39147 |
1,87 |
| L1Loss |
float32 |
[106 4] |
contiguous |
sum |
fwd |
110131 |
47041 |
2,34 |
| L1Loss |
float32 |
[135 4] |
noncontiguous |
sum |
fwd |
114659 |
43130 |
2,66 |
| L1Loss |
float32 |
[190 4] |
noncontiguous |
sum |
fwd |
78946 |
46810 |
1,69 |
| L1Loss |
float32 |
[207 4] |
contiguous |
sum |
fwd |
109475 |
41245 |
2,65 |
| L1Loss |
float32 |
[207 4] |
noncontiguous |
sum |
fwd |
45905 |
43219 |
1,06 |
| L1Loss |
float32 |
[249 128] |
noncontiguous |
sum |
fwd |
133555 |
42952 |
3,11 |
| L1Loss |
float32 |
[349 222] |
contiguous |
sum |
fwd |
53745 |
44836 |
1,20 |
| L1Loss |
float32 |
[451 128] |
contiguous |
sum |
fwd |
119347 |
44676 |
2,67 |
| L1Loss |
float32 |
[451 128] |
noncontiguous |
sum |
fwd |
58114 |
44375 |
1,31 |
| L1Loss |
float32 |
[603 546] |
contiguous |
sum |
fwd |
64529 |
45992 |
1,40 |
| L1Loss |
float32 |
[603 546] |
noncontiguous |
sum |
fwd |
75073 |
55557 |
1,35 |
bfloat16
| op_name |
dtype |
size |
contiguous |
reduction |
direction |
ROCm |
MIOpen |
Improvement |
| L1Loss |
bfloat16 |
[7 4] |
contiguous |
sum |
fwd |
52609 |
45584 |
1,15 |
| L1Loss |
bfloat16 |
[18 4] |
contiguous |
sum |
fwd |
112019 |
40624 |
2,76 |
| L1Loss |
bfloat16 |
[18 4] |
noncontiguous |
sum |
fwd |
113763 |
48659 |
2,34 |
| L1Loss |
bfloat16 |
[28 4] |
noncontiguous |
sum |
fwd |
97154 |
46846 |
2,07 |
| L1Loss |
bfloat16 |
[34 4] |
contiguous |
sum |
fwd |
85330 |
43824 |
1,95 |
| L1Loss |
bfloat16 |
[54 4] |
contiguous |
sum |
fwd |
89058 |
44944 |
1,98 |
| L1Loss |
bfloat16 |
[54 4] |
noncontiguous |
sum |
fwd |
99987 |
44801 |
2,23 |
| L1Loss |
bfloat16 |
[72 4] |
noncontiguous |
sum |
fwd |
103539 |
44108 |
2,35 |
| L1Loss |
bfloat16 |
[98 4] |
contiguous |
sum |
fwd |
79794 |
43877 |
1,82 |
| L1Loss |
bfloat16 |
[98 4] |
noncontiguous |
sum |
fwd |
47489 |
42686 |
1,11 |
| L1Loss |
bfloat16 |
[106 4] |
contiguous |
sum |
fwd |
128467 |
44979 |
2,86 |
| L1Loss |
bfloat16 |
[106 4] |
noncontiguous |
sum |
fwd |
97250 |
45086 |
2,16 |
| L1Loss |
bfloat16 |
[135 4] |
noncontiguous |
sum |
fwd |
109411 |
42953 |
2,55 |
| L1Loss |
bfloat16 |
[190 4] |
contiguous |
sum |
fwd |
74913 |
47112 |
1,59 |
| L1Loss |
bfloat16 |
[207 4] |
contiguous |
sum |
fwd |
116563 |
45939 |
2,54 |
| L1Loss |
bfloat16 |
[207 4] |
noncontiguous |
sum |
fwd |
84306 |
45317 |
1,86 |
| L1Loss |
bfloat16 |
[349 222] |
contiguous |
sum |
fwd |
79554 |
52535 |
1,51 |
| L1Loss |
bfloat16 |
[349 222] |
noncontiguous |
sum |
fwd |
60081 |
44926 |
1,34 |
| L1Loss |
bfloat16 |
[451 128] |
contiguous |
sum |
fwd |
58866 |
44392 |
1,33 |
| L1Loss |
bfloat16 |
[451 128] |
noncontiguous |
sum |
fwd |
96659 |
44943 |
2,15 |
| L1Loss |
bfloat16 |
[603 546] |
contiguous |
sum |
fwd |
78369 |
54686 |
1,43 |
| L1Loss |
bfloat16 |
[603 546] |
noncontiguous |
sum |
fwd |
74386 |
54117 |
1,37 |
| L1Loss |
bfloat16 |
[1024 1024] |
contiguous |
sum |
fwd |
83682 |
70349 |
1,19 |