onnx-mlir
onnx-mlir copied to clipboard
Optimization oppotunity for GlobalAveragePool
Hi. I'm new to ONNX-MLIR.
I converted a MobileNet v3 model into MLIR by the command ./onnx-mlir --EmitMLIR -O3 ./mobilenetv3.onnx
, and got the following MLIR code corresponding to GlobalAveragePool op:
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 960 {
affine.for %arg3 = 0 to 7 {
affine.for %arg4 = 0 to 7 {
%128 = affine.load %alloc_357[%arg1, %arg2, %arg3, %arg4] : memref<1x960x7x7xf32>
%129 = affine.load %alloc_358[%arg1, %arg2, %c0, %c0] : memref<1x960x1x1xf32>
%130 = arith.addf %129, %128 : f32
affine.store %130, %alloc_358[%arg1, %arg2, %c0, %c0] : memref<1x960x1x1xf32>
}
}
}
}
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 960 {
affine.for %arg3 = 0 to 1 {
affine.for %arg4 = 0 to 1 {
%128 = affine.load %alloc_358[%arg1, %arg2, %arg3, %arg4] : memref<1x960x1x1xf32>
%129 = arith.divf %128, %cst : f32
affine.store %129, %alloc_358[%arg1, %arg2, %arg3, %arg4] : memref<1x960x1x1xf32>
}
}
}
}
It looks like the first affine.for
nest accumulates the values and the second one divides it by the area to calculate the average.
I guess the two loop nest can be fused into one loop nest, but I found that even LLVM-IR and generated binary contains literally two loop nest. I'm not sure why it isn't optimized away. Is this an expected behavior?