iree [compile]: One or more operations with large vector sizes (16384 bytes) were found: for QuantizeLinear operation

What happened?

for given IR, seeing error "One or more operations with large vector sizes (16384 bytes) were found"

This may be related to https://github.com/iree-org/iree/issues/18005 . if fix for 18005 is fixing the below issue then this can be closed

module {
  func.func @torch_jit(%arg0: !torch.vtensor<[1,3,256,256],f32>, %arg1: !torch.vtensor<[1,3,256,256],f32>, %arg2: !torch.vtensor<[4],si64>, %arg3 : !torch.vtensor<[2],si64>, %arg4 : !torch.vtensor<[1],si64>, %arg5 : !torch.vtensor<[32],si64>) -> (!torch.vtensor<[1024,7,7,1],si8> ) attributes {torch.onnx_meta.ir_version = 8 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.13.1"} {
    %6481 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__246> : tensor<si64>} : () -> !torch.vtensor<[],si64> 
    %2051 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1077> : tensor<32xsi64>} : () -> !torch.vtensor<[32],si64> 
    %2073 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1084> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %558 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__182> : tensor<1x7x7x2xf32>} : () -> !torch.vtensor<[1,7,7,2],f32> 
    %668 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__259> : tensor<2xsi64>} : () -> !torch.vtensor<[2],si64> 
    %2118 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1108> : tensor<f32>} : () -> !torch.vtensor<[],f32> 
    %2119 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1109> : tensor<si8>} : () -> !torch.vtensor<[],si8> 

    %2031 = torch.operator "onnx.Unsqueeze"(%6481, %arg4) : (!torch.vtensor<[],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1],si64> 
    %2038 = torch.operator "onnx.Concat"(%2031, %2031, %2031, %2031) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[4],si64> 

    %2055 = torch.operator "onnx.Reshape"(%2051, %arg4) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[32],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[32],si64> 
    %2065 = torch.operator "onnx.Reshape"(%2055, %arg3) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[32],si64>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[1,32],si64> 
    %2066 = torch.operator "onnx.Expand"(%2065, %arg3) : (!torch.vtensor<[1,32],si64>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[32,32],si64> 
    %2068 = torch.operator "onnx.Unsqueeze"(%2066, %2073) : (!torch.vtensor<[32,32],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,32,32],si64> 
    %2071 = torch.operator "onnx.Concat"(%2068, %2068) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[1,32,32],si64>, !torch.vtensor<[1,32,32],si64>) -> !torch.vtensor<[2,32,32],si64> 
    %2072 = torch.operator "onnx.Cast"(%2071) {torch.onnx.to = 1 : si64} : (!torch.vtensor<[2,32,32],si64>) -> !torch.vtensor<[2,32,32],f32> 
    %2074 = torch.operator "onnx.Unsqueeze"(%2072, %2073) : (!torch.vtensor<[2,32,32],f32>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1,2,32,32],f32> 
    %2075 = torch.operator "onnx.Shape"(%arg2) : (!torch.vtensor<[4],si64>) -> !torch.vtensor<[1],si64> 
    %2076 = torch.operator "onnx.ConstantOfShape"(%2075) {torch.onnx.value = dense_resource<__1085> : tensor<1xsi64>} : (!torch.vtensor<[1],si64>) -> !torch.vtensor<[4],si64> 
    %2077 = torch.operator "onnx.Expand"(%2074, %2076) : (!torch.vtensor<[1,2,32,32],f32>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[?,2,32,32],f32> 
    %2078 = torch.operator "onnx.Tile"(%2077, %2038) : (!torch.vtensor<[?,2,32,32],f32>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[?,?,?,?],f32> 
    %2079 = torch.operator "onnx.Cast"(%2078) {torch.onnx.to = 1 : si64} : (!torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> 
    %2082 = torch.operator "onnx.QuantizeLinear"(%2079, %2118, %2119) : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[?,?,?,?],si8> 
    %2085 = torch.operator "onnx.DequantizeLinear"(%2082, %2118, %2119) : (!torch.vtensor<[?,?,?,?],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[?,?,?,?],f32> 
    %2089 = torch.operator "onnx.Transpose"(%2085) {torch.onnx.perm = [0 : si64, 2 : si64, 3 : si64, 1 : si64]} : (!torch.vtensor<[?,?,?,?],f32>) -> !torch.vtensor<[?,?,?,?],f32> 
    %561 = torch.operator "onnx.QuantizeLinear"(%558, %2118, %2119) : (!torch.vtensor<[1,7,7,2],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,7,7,2],si8> 
    %564 = torch.operator "onnx.DequantizeLinear"(%561, %2118, %2119) : (!torch.vtensor<[1,7,7,2],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1,7,7,2],f32> 
    %2100 = torch.operator "onnx.Reshape"(%2089, %arg2) {torch.onnx.allowzero = 0 : si64} : (!torch.vtensor<[?,?,?,?],f32>, !torch.vtensor<[4],si64>) -> !torch.vtensor<[1024,1,1,2],f32> 
    %2102 = torch.operator "onnx.Mul"(%564, %2118) : (!torch.vtensor<[1,7,7,2],f32>, !torch.vtensor<[],f32>) -> !torch.vtensor<[1,7,7,2],f32> 
    %2103 = torch.operator "onnx.Add"(%2100, %2102) : (!torch.vtensor<[1024,1,1,2],f32>, !torch.vtensor<[1,7,7,2],f32>) -> !torch.vtensor<[1024,7,7,2],f32> 
    %2106 = torch.operator "onnx.QuantizeLinear"(%2103, %2118, %2119) : (!torch.vtensor<[1024,7,7,2],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1024,7,7,2],si8> 
    %2109 = torch.operator "onnx.DequantizeLinear"(%2106, %2118, %2119) : (!torch.vtensor<[1024,7,7,2],si8>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1024,7,7,2],f32> 
    %2111:2 = torch.operator "onnx.Split"(%2109, %668) {torch.onnx.axis = -1 : si64} : (!torch.vtensor<[1024,7,7,2],f32>, !torch.vtensor<[2],si64>) -> (!torch.vtensor<[1024,7,7,1],f32>, !torch.vtensor<[1024,7,7,1],f32>) 
    %2120 = torch.operator "onnx.QuantizeLinear"(%2111#0, %2118, %2119) : (!torch.vtensor<[1024,7,7,1],f32>, !torch.vtensor<[],f32>, !torch.vtensor<[],si8>) -> !torch.vtensor<[1024,7,7,1],si8> 
    return %2120 : !torch.vtensor<[1024,7,7,1],si8>
  }
}

{-#
  dialect_resources: {
    builtin: {
      __246: "0x080000000100000000000000",
      __1077: "0x0800000000000000000000000100000000000000020000000000000003000000000000000400000000000000050000000000000006000000000000000700000000000000080000000000000009000000000000000A000000000000000B000000000000000C000000000000000D000000000000000E000000000000000F0000000000000010000000000000001100000000000000120000000000000013000000000000001400000000000000150000000000000016000000000000001700000000000000180000000000000019000000000000001A000000000000001B000000000000001C000000000000001D000000000000001E000000000000001F00000000000000",
      __182: "0x08000000000040C0000040C0000040C0000000C0000040C0000080BF000040C000000000000040C00000803F000040C000000040000040C000004040000000C0000040C0000000C0000000C0000000C0000080BF000000C000000000000000C00000803F000000C000000040000000C000004040000080BF000040C0000080BF000000C0000080BF000080BF000080BF00000000000080BF0000803F000080BF00000040000080BF0000404000000000000040C000000000000000C000000000000080BF0000000000000000000000000000803F000000000000004000000000000040400000803F000040C00000803F000000C00000803F000080BF0000803F000000000000803F0000803F0000803F000000400000803F0000404000000040000040C000000040000000C000000040000080BF0000004000000000000000400000803F0000004000000040000000400000404000004040000040C000004040000000C000004040000080BF0000404000000000000040400000803F00004040000000400000404000004040",
      __1084: "0x080000000000000000000000",
      __1108: "0x080000000000003F",
      __1109: "0x0800000000",
      __1085: "0x080000000100000000000000",
      __259: "0x0800000001000000000000000100000000000000"
    }
  }
#-}

Steps to reproduce your issue

command to reproduce the issue:

iree-compile  --mlir-print-debuginfo --mlir-print-op-on-diagnostic=false  model.torch_onnx.mlir --iree-hal-target-backends=llvm-cpu

IREE version:

IREE compiler version 20240819.990 @ aeda14995f16ed1302db616adf0c03acf80f27ee LLVM version 20.0.0git

Failing stage:

/ -----// IR Dump After LLVMCPUVerifyVectorSizeLegalityPass Failed (iree-llvmcpu-verify-vector-size-legality) //----- // func.func @jit_eval_7_dispatch_0_elementwise_1024x7x7x2_i8xf32() attributes {translation_info = #iree_codegen.translation_info<CPUDoubleTilingExpert, {enable_loop_peeling}>} { %cst = arith.constant dense<1.270000e+02> : vector<1x1x7xf32> loc(unknown) %cst_0 = arith.constant dense<-1.280000e+02> : vector<1x1x7xf32> loc(unknown) %cst_1 = arith.constant dense<0.000000e+00> : vector<1x1x7xf32> loc(unknown) %cst_2 = arith.constant dense<5.000000e-01> : vector<1x1x7xf32> loc(unknown) %cst_3 = arith.constant dense<5.000000e-01> : vector<1024x7x7x2xf32> loc(unknown) %c0_i8 = arith.constant 0 : i8 loc(unknown) %c1 = arith.constant 1 : index loc(unknown) %c7 = arith.constant 7 : index loc(unknown) %c128 = arith.constant 128 : index loc(unknown) %c1024 = arith.constant 1024 : index loc(unknown) %cst_4 = arith.constant 0.000000e+00 : f32 loc(unknown) %c0 = arith.constant 0 : index loc(unknown) %0 = hal.interface.binding.subspan layout(<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, "ReadOnly|Indirect">, <1, storage_buffer, Indirect>], flags = Indirect>]>) set(0) binding(0) alignment(64) offset(%c0) flags("ReadOnly|Indirect") : !flow.dispatch.tensor<readonly:tensor<1024x7x7x2xi8>> loc("t1.mlir":37:13) %1 = hal.interface.binding.subspan layout(<push_constants = 0, sets = [<0, bindings = [<0, storage_buffer, "ReadOnly|Indirect">, <1, storage_buffer, Indirect>], flags = Indirect>]>) set(0) binding(1) alignment(64) offset(%c0) flags(Indirect) : !flow.dispatch.tensor<writeonly:tensor<1024x7x7xi8>> loc(callsite("t1.mlir":37:13 at "t1.mlir":37:13)) %2 = flow.dispatch.tensor.load %0, offsets = [0, 0, 0, 0], sizes = [1024, 7, 7, 2], strides = [1, 1, 1, 1] : !flow.dispatch.tensor<readonly:tensor<1024x7x7x2xi8>> -> tensor<1024x7x7x2xi8> loc(callsite("t1.mlir":37:13 at "t1.mlir":37:13))

What component(s) does this issue relate to?

Frontends

Version information

No response

Additional context

No response

Aug 13 '24 12:08 pdhirajkumarprasad

Looked at it a bit, it is a duplicate of the problem mentioned https://github.com/iree-org/iree/issues/18005#issuecomment-2278885607, but that isn't the initial problem identified in the issue. Maybe worth keeping this one?

Aug 13 '24 16:08 IanWood1

@PhaneeshB @pashu123

I remember you guys were talking about this kind of issue. What is the current status?

Aug 13 '24 19:08 zjgarvey

@PhaneeshB @pashu123

I remember you guys were talking about this kind of issue. What is the current status?

One patch got merged https://github.com/iree-org/iree/pull/18114/files (which solves the issue with dequant + conv cases).

Aug 13 '24 19:08 pashu123

@PhaneeshB @pashu123 I remember you guys were talking about this kind of issue. What is the current status?

One patch got merged https://github.com/iree-org/iree/pull/18114/files (which solves the issue with dequant + conv cases).

Okay, that does seem to resolve the issue with deeplabv3, so thanks for that.

There are still several models failing because of the broader large vector sizes issue with quantized models. In addition to the IR shared here, there are still two protected Win24 models failing with a large vector sizes issue, so I'll try to narrow down some reproducing IR that doesn't include protected info and share it.

I feel like there is some kind of more fundamental issue with the stack allocation when dequants get fused with other ops.

Aug 14 '24 18:08 zjgarvey

Issue is not present with latest build so closing this one, If we see similar issue in any other model, we will open new issue

Sep 13 '24 04:09 pdhirajkumarprasad