onnx-mlir icon indicating copy to clipboard operation
onnx-mlir copied to clipboard

Optimization oppotunity for GlobalAveragePool

Open maekawatoshiki opened this issue 1 year ago • 2 comments

Hi. I'm new to ONNX-MLIR.

I converted a MobileNet v3 model into MLIR by the command ./onnx-mlir --EmitMLIR -O3 ./mobilenetv3.onnx, and got the following MLIR code corresponding to GlobalAveragePool op:

    affine.for %arg1 = 0 to 1 {
      affine.for %arg2 = 0 to 960 {
        affine.for %arg3 = 0 to 7 {
          affine.for %arg4 = 0 to 7 {
            %128 = affine.load %alloc_357[%arg1, %arg2, %arg3, %arg4] : memref<1x960x7x7xf32>
            %129 = affine.load %alloc_358[%arg1, %arg2, %c0, %c0] : memref<1x960x1x1xf32>
            %130 = arith.addf %129, %128 : f32
            affine.store %130, %alloc_358[%arg1, %arg2, %c0, %c0] : memref<1x960x1x1xf32>
          }
        }
      }
    }
    affine.for %arg1 = 0 to 1 {
      affine.for %arg2 = 0 to 960 {
        affine.for %arg3 = 0 to 1 {
          affine.for %arg4 = 0 to 1 {
            %128 = affine.load %alloc_358[%arg1, %arg2, %arg3, %arg4] : memref<1x960x1x1xf32>
            %129 = arith.divf %128, %cst : f32
            affine.store %129, %alloc_358[%arg1, %arg2, %arg3, %arg4] : memref<1x960x1x1xf32>
          }
        }
      }
    }

It looks like the first affine.for nest accumulates the values and the second one divides it by the area to calculate the average.

I guess the two loop nest can be fused into one loop nest, but I found that even LLVM-IR and generated binary contains literally two loop nest. I'm not sure why it isn't optimized away. Is this an expected behavior?

maekawatoshiki avatar May 14 '23 05:05 maekawatoshiki