onnx-mlir Usage of --shapeInformation results in different outcome depending on the input values

Looking at the bertwquad12.onnx model (https://github.com/onnx/models/tree/main/text/machine_comprehension/bert-squad/model)

When I do shape inference with the ? dynamic values replace by 4, the final model is fully shaped with constant shapes

> onnx-mlir -O3 -v -EmitONNXIR bertsquad-12.onnx --shapeInformation="0:4,1:4x256,2:4x256,3:4x256"
> grep "?" bertsquad-12.tmp |wc -l
0

But when the ? values are replaced by 1, not all of the shapes are compile time shapes

>onnx-mlir -O3 -v -EmitONNXIR bertsquad-12.onnx --shapeInformation="0:1,1:1x256,2:1x256,3:1x256"
> grep "?" bertsquad-12.tmp |wc -l
9

To me that does not totally make sense. Would be nice to have someone investigate the discrepancy.

Aug 05 '22 14:08 AlexandreEichenberger

@negiyas any insight on what is happening?

Aug 11 '22 13:08 AlexandreEichenberger

Please look at the following outputs of the both cases.

** In the dynamic value== 1 case,**

%1110 = "onnx.Mul"(%1096, %1104) : (tensor<256x768xf32>, tensor<256x1xf32>) -> tensor<256x768xf32>
%1111 = "onnx.Mul"(%1110, %1105) : (tensor<256x768xf32>, tensor<768xf32>) -> tensor<256x768xf32>
%1112 = "onnx.Add"(%1111, %1109) {onnx_node_name = "bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1"} : (tensor<256x768xf32>, tensor<256x768xf32>) -> tensor<256x768xf32>
%1113 = "onnx.Constant"() {value = dense<[256, 768]> : tensor<2xi64>} : () -> tensor<2xi64>
%1114 = "onnx.Constant"() {value = dense<[1, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>
%1116 = "onnx.Constant"() {value = dense<[1.25070888E-4, 1.26759609E-4]> : tensor<2xf32>} : () -> tensor<2xf32>
%1117 = "onnx.Gemm"(%1115, %92, %1116) {alpha = 1.000000e+00 : f32, beta = 1.000000e+00 : f32, transA = 0 : si64, transB = 0 : si64} : (tensor<?x?xf32>, tensor<768x2xf32>, tensor<2xf32>) -> tensor<?x2xf32>

In the dynamic value=4 case

%1110 = "onnx.Mul"(%1096, %1104) : (tensor<1024x768xf32>, tensor<1024x1xf32>) -> tensor<1024x768xf32>
%1111 = "onnx.Mul"(%1110, %1105) : (tensor<1024x768xf32>, tensor<768xf32>) -> tensor<1024x768xf32>
%1112 = "onnx.Add"(%1111, %1109) {onnx_node_name = "bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1"} : (tensor<1024x768xf32>, tensor<1024x768xf32>) -> tensor<1024x768xf32>
%1113 = "onnx.Constant"() {value = dense<[4, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1114 = "onnx.Constant"() {value = dense<[1.25070888E-4, 1.26759609E-4]> : tensor<2xf32>} : () -> tensor<2xf32>
%1115 = "onnx.Gemm"(%1112, %92, %1114) {alpha = 1.000000e+00 : f32, beta = 1.000000e+00 : f32, transA = 0 : si64, transB = 0 : si64} : (tensor<1024x768xf32>, tensor<768x2xf32>, tensor<2xf32>) -> tensor<1024x2xf32>

Findings are as follows. (1) In the dynamic value=1 case, onnx.Constant and onnx.Reshape ops in %1113 and %1115 should be removed, because the input tensor shape (=<256x768xf32>) and the specified shape (=[256x768]) are identical. (2) In the dynamic value=4 case, onnx.Constant and onnx.Reshape ops are removed correctly.

I am investigating why the RemoveIdentityReshapePattern in src/Dialect/ONNX/Rewrite.td does not work for the dynamic value=1 case. Your comments and sugessions are very welcome!

Aug 12 '22 05:08 negiyas

Maybe it needs another call of shape-inference. Have you tried to use repeatOnnxTransform, e.g., set it to 1 or more to repeatedly call shape-inference and constant propagation?

Aug 12 '22 06:08 tungld

Looking at this:

%1113 = "onnx.Constant"() {value = dense<[256, 768]> : tensor<2xi64>} : () -> tensor<2xi64>
%1114 = "onnx.Constant"() {value = dense<[1, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>

I believe Reshape needs a call of shape-inference to get its correct output's dimensions.

Aug 12 '22 06:08 tungld

I tried epeatOnnxTransform=10 . but the results were the same. Is there any reason why it works for the the dynamic value=4 case, but does not work for he dynamic value=4 case?

I am looking at the RemoveIdentityReshapePattern now.

def RemoveIdentityReshapePattern:  Pat<
  // Remove an identity pattern. Input tensor already has the specified shape.
  (ONNXReshapeOp $val, $shape, $az),
  // Remove the transpose.
  (replaceWithValue $val),
  // Check that val has the specified shape.
  [(HasSpecifiedConstantShape $val, $shape)]>;

And I am investigating inputs of the HasSpecifiedConstantShape function for the onnx.Reshape op that should be removed.

(1) In the dynamic value=1 case, HasSpecifiedConstantShape receives 256x768xf32 tensor as value and [1, 256, 768] as the shape, and returns false, because the value tensor is 2D and the shape is 3D. I suppose that the shape [1, 256, 768] passed to the function is wrong, and should be [256, 768] (If [256, 768] is passed as the shape, it returns true, and the onnx.Reshape op is removed by the he dynamic value=4 case.) (2) In the dynamic value=4 case, the HasSpecifiedConstantShape receives 1024x768xf32 tensor as value [1024, 768] as the shape and returns true. It seems be reasonable.

the dynamic value=1 case ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[1, 256, 12, 64, ] => shape[256, 768, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 768, ] return FALSE # for %1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>

the dynamic value=4 case ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] FALSE ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] return FALSE ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] return FALSE ZZZ value[4, 256, 12, 64, ] => shape[1024, 768, ] return FALSE ZZZ value[1024, 768, ] => shape[1024, 768, ] return TRUE # and the onnx.Reshape op is removed.

I am investigating why the shape [1, 256, 768] (should be [256, 768]) passed to the HasSpecifiedConstantShape in the dynamic value=1 case.

Your comments are suggestions are very welcome!

Aug 12 '22 07:08 negiyas

Please look at the following outputs of the both cases.

Good to see that you are finding inefficiencies that can be further optimized in the ?=1 case. Please try to also look at why we have more dyn cases that remains in the ?=4 case, which was the original issue with shape inference. We really need to remove as many dyn ref as possible for NNPA

Aug 12 '22 13:08 AlexandreEichenberger

@AlexandreEichenberger Thank you for your comments. I found that "--onnx-op-transform-threshold=5" (not "--repeatOnnxTransform=5") option can solve this issue. This option can change the maximum number of transform pass repeat times.

The default value is currently "3", but we need 5 times at least in this case. Large default value (e.g. 5 or 10) does not cause a large issue, because the transform pass is stopped if a transform pass does not change the target code at all, as you know. I propose to change the default value from 3 to 5 (or more) to avoid similar issues.

Your comments and suggestions are very welcome. Thank you.

Aug 16 '22 04:08 negiyas

In the previous comment:

I tried epeatOnnxTransform=10 . but the results were the same.

and, now

I found that "--repeatOnnxTransform=5" option can solve this issue.

Does it mean value 5 worked but 10 did not?

Aug 16 '22 04:08 tungld

I am sorry, I used the "--onnx-op-transform-threshold=5" (not "--repeatOnnxTransform=5") option. (I updated my previous comments). It seems that the "--repeatOnnxTransform" option is used only when the "onnxOpTransformThreshold" value <= 0, which is set by the --onnx-op-transform-threshold" (default value is 3).

Aug 16 '22 05:08 negiyas

I don't think we can expect our users to setup this option. Can we set it up to a higher value?

if no changes occurs at iteration n, do we still execute all of the iteration or we have a simple mechanism to stop? I would not invest a lot of effort into this, but if it is simple, then we should consider it.
if a concern, we could enable =10 only at -O3... not a big difference as we expect all our users to run at -O3.

Aug 22 '22 16:08 AlexandreEichenberger

if no changes occurs at iteration n, do we still execute all of the iteration or we have a simple mechanism to stop? I would not invest a lot of effort into this, but if it is simple, then we should consider it.

Yes. We already have simple mechanism to stop it by checking a tag calculated by the IR (look at ONNXOpTransformPass::runOnOperation in ./src/Transform/ONNX/ONNXOpTransformPass.cpp).

if a concern, we could enable =10 only at -O3... not a big difference as we expect all our users to run at -O3.

Because the iteration will be stopped when no changes occur, it should not be a big issue to change the value from 3 to10 simply. I will create a PR to update the default value if you agree to the solution.

Aug 23 '22 05:08 negiyas

@negiyas please do, it should be an easy one. Thanks for the investigation and finding a fix.

And link the PR so we know when to close this one.

Aug 23 '22 14:08 AlexandreEichenberger

Created PR#1633 ( https://github.com/onnx/onnx-mlir/pull/1633 ) to change the default value for the "--onnx-op-transform-threshold" from 3 to 10 . Confirmed that this issue is solved by the PR.

Aug 25 '22 02:08 negiyas

can we close this issue now that PR #1633 has landed?

Sep 06 '22 19:09 sorenlassen

completed

Sep 08 '22 14:09 AlexandreEichenberger

onnx-mlir onnx-mlir copied to clipboard

Usage of --shapeInformation results in different outcome depending on the input values

onnx-mlir
onnx-mlir copied to clipboard