onnx-mlir
onnx-mlir copied to clipboard
Usage of --shapeInformation results in different outcome depending on the input values
Looking at the bertwquad12.onnx model (https://github.com/onnx/models/tree/main/text/machine_comprehension/bert-squad/model)
When I do shape inference with the ?
dynamic values replace by 4, the final model is fully shaped with constant shapes
> onnx-mlir -O3 -v -EmitONNXIR bertsquad-12.onnx --shapeInformation="0:4,1:4x256,2:4x256,3:4x256"
> grep "?" bertsquad-12.tmp |wc -l
0
But when the ?
values are replaced by 1, not all of the shapes are compile time shapes
>onnx-mlir -O3 -v -EmitONNXIR bertsquad-12.onnx --shapeInformation="0:1,1:1x256,2:1x256,3:1x256"
> grep "?" bertsquad-12.tmp |wc -l
9
To me that does not totally make sense. Would be nice to have someone investigate the discrepancy.
@negiyas any insight on what is happening?
Please look at the following outputs of the both cases.
** In the dynamic value== 1 case,**
%1110 = "onnx.Mul"(%1096, %1104) : (tensor<256x768xf32>, tensor<256x1xf32>) -> tensor<256x768xf32>
%1111 = "onnx.Mul"(%1110, %1105) : (tensor<256x768xf32>, tensor<768xf32>) -> tensor<256x768xf32>
%1112 = "onnx.Add"(%1111, %1109) {onnx_node_name = "bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1"} : (tensor<256x768xf32>, tensor<256x768xf32>) -> tensor<256x768xf32>
%1113 = "onnx.Constant"() {value = dense<[256, 768]> : tensor<2xi64>} : () -> tensor<2xi64>
%1114 = "onnx.Constant"() {value = dense<[1, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>
%1116 = "onnx.Constant"() {value = dense<[1.25070888E-4, 1.26759609E-4]> : tensor<2xf32>} : () -> tensor<2xf32>
%1117 = "onnx.Gemm"(%1115, %92, %1116) {alpha = 1.000000e+00 : f32, beta = 1.000000e+00 : f32, transA = 0 : si64, transB = 0 : si64} : (tensor<?x?xf32>, tensor<768x2xf32>, tensor<2xf32>) -> tensor<?x2xf32>
In the dynamic value=4 case
%1110 = "onnx.Mul"(%1096, %1104) : (tensor<1024x768xf32>, tensor<1024x1xf32>) -> tensor<1024x768xf32>
%1111 = "onnx.Mul"(%1110, %1105) : (tensor<1024x768xf32>, tensor<768xf32>) -> tensor<1024x768xf32>
%1112 = "onnx.Add"(%1111, %1109) {onnx_node_name = "bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1"} : (tensor<1024x768xf32>, tensor<1024x768xf32>) -> tensor<1024x768xf32>
%1113 = "onnx.Constant"() {value = dense<[4, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1114 = "onnx.Constant"() {value = dense<[1.25070888E-4, 1.26759609E-4]> : tensor<2xf32>} : () -> tensor<2xf32>
%1115 = "onnx.Gemm"(%1112, %92, %1114) {alpha = 1.000000e+00 : f32, beta = 1.000000e+00 : f32, transA = 0 : si64, transB = 0 : si64} : (tensor<1024x768xf32>, tensor<768x2xf32>, tensor<2xf32>) -> tensor<1024x2xf32>
Findings are as follows. (1) In the dynamic value=1 case, onnx.Constant and onnx.Reshape ops in %1113 and %1115 should be removed, because the input tensor shape (=<256x768xf32>) and the specified shape (=[256x768]) are identical. (2) In the dynamic value=4 case, onnx.Constant and onnx.Reshape ops are removed correctly.
I am investigating why the RemoveIdentityReshapePattern in src/Dialect/ONNX/Rewrite.td does not work for the dynamic value=1 case. Your comments and sugessions are very welcome!
Maybe it needs another call of shape-inference. Have you tried to use repeatOnnxTransform
, e.g., set it to 1 or more to repeatedly call shape-inference and constant propagation?
Looking at this:
%1113 = "onnx.Constant"() {value = dense<[256, 768]> : tensor<2xi64>} : () -> tensor<2xi64>
%1114 = "onnx.Constant"() {value = dense<[1, 256, 2]> : tensor<3xi64>} : () -> tensor<3xi64>
%1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>
I believe Reshape
needs a call of shape-inference to get its correct output's dimensions.
I tried epeatOnnxTransform=10 . but the results were the same. Is there any reason why it works for the the dynamic value=4 case, but does not work for he dynamic value=4 case?
I am looking at the RemoveIdentityReshapePattern now.
def RemoveIdentityReshapePattern: Pat<
// Remove an identity pattern. Input tensor already has the specified shape.
(ONNXReshapeOp $val, $shape, $az),
// Remove the transpose.
(replaceWithValue $val),
// Check that val has the specified shape.
[(HasSpecifiedConstantShape $val, $shape)]>;
And I am investigating inputs of the HasSpecifiedConstantShape function for the onnx.Reshape op that should be removed.
(1) In the dynamic value=1 case, HasSpecifiedConstantShape receives 256x768xf32 tensor as value and [1, 256, 768] as the shape, and returns false, because the value tensor is 2D and the shape is 3D. I suppose that the shape [1, 256, 768] passed to the function is wrong, and should be [256, 768] (If [256, 768] is passed as the shape, it returns true, and the onnx.Reshape op is removed by the he dynamic value=4 case.) (2) In the dynamic value=4 case, the HasSpecifiedConstantShape receives 1024x768xf32 tensor as value [1024, 768] as the shape and returns true. It seems be reasonable.
the dynamic value=1 case ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 12, 64, ] return FALSE ZZZ value[1, 256, 12, 64, ] => shape[256, 768, ] return FALSE ZZZ value[256, 768, ] => shape[1, 256, 768, ] return FALSE # for %1115 = "onnx.Reshape"(%1112, %1113) {allowzero = 0 : si64} : (tensor<256x768xf32>, tensor<2xi64>) -> tensor<?x?xf32>
the dynamic value=4 case ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] FALSE ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] return FALSE ZZZ value[1024, 768, ] => shape[4, 256, 12, 64, ] return FALSE ZZZ value[4, 256, 12, 64, ] => shape[1024, 768, ] return FALSE ZZZ value[1024, 768, ] => shape[1024, 768, ] return TRUE # and the onnx.Reshape op is removed.
I am investigating why the shape [1, 256, 768] (should be [256, 768]) passed to the HasSpecifiedConstantShape in the dynamic value=1 case.
Your comments are suggestions are very welcome!
Please look at the following outputs of the both cases.
Good to see that you are finding inefficiencies that can be further optimized in the ?
=1
case. Please try to also look at why we have more dyn cases that remains in the ?
=4
case, which was the original issue with shape inference. We really need to remove as many dyn ref as possible for NNPA
@AlexandreEichenberger Thank you for your comments. I found that "--onnx-op-transform-threshold=5" (not "--repeatOnnxTransform=5") option can solve this issue. This option can change the maximum number of transform pass repeat times.
The default value is currently "3", but we need 5 times at least in this case. Large default value (e.g. 5 or 10) does not cause a large issue, because the transform pass is stopped if a transform pass does not change the target code at all, as you know. I propose to change the default value from 3 to 5 (or more) to avoid similar issues.
Your comments and suggestions are very welcome. Thank you.
In the previous comment:
I tried epeatOnnxTransform=10 . but the results were the same.
and, now
I found that "--repeatOnnxTransform=5" option can solve this issue.
Does it mean value 5 worked but 10 did not?
I am sorry, I used the "--onnx-op-transform-threshold=5" (not "--repeatOnnxTransform=5") option. (I updated my previous comments). It seems that the "--repeatOnnxTransform" option is used only when the "onnxOpTransformThreshold" value <= 0, which is set by the --onnx-op-transform-threshold" (default value is 3).
I don't think we can expect our users to setup this option. Can we set it up to a higher value?
- if no changes occurs at iteration n, do we still execute all of the iteration or we have a simple mechanism to stop? I would not invest a lot of effort into this, but if it is simple, then we should consider it.
- if a concern, we could enable
=10
only at-O3
... not a big difference as we expect all our users to run at -O3.
if no changes occurs at iteration n, do we still execute all of the iteration or we have a simple mechanism to stop? I would not invest a lot of effort into this, but if it is simple, then we should consider it.
Yes. We already have simple mechanism to stop it by checking a tag calculated by the IR (look at ONNXOpTransformPass::runOnOperation in ./src/Transform/ONNX/ONNXOpTransformPass.cpp).
if a concern, we could enable =10 only at -O3... not a big difference as we expect all our users to run at -O3.
Because the iteration will be stopped when no changes occur, it should not be a big issue to change the value from 3 to10 simply. I will create a PR to update the default value if you agree to the solution.
@negiyas please do, it should be an easy one. Thanks for the investigation and finding a fix.
And link the PR so we know when to close this one.
Created PR#1633 ( https://github.com/onnx/onnx-mlir/pull/1633 ) to change the default value for the "--onnx-op-transform-threshold" from 3 to 10 . Confirmed that this issue is solved by the PR.
can we close this issue now that PR #1633 has landed?
completed