TensorRT Error Code 4: Miscellaneous (IShuffleLayer Reshape_427: reshape changes volume. Reshaping [900,1,256] to [900,7200,32].)

hello, when i coverted my onnx model to TensorRT by the command, ./trtexec --onnx=model.onnx --saveEngine=model.engine i got big diff between pytorch result and trt result. i located the problem which might be related to the decoder transformer part of my model. so i only coverted the transformer part to onnx and try to find out what is wrong. but when i run the command ./trtexec --onnx=decoder_transformer.onnx --saveEngine=decoder_transformer.engineto covert onnx to trt. i got an error which didn't appear during the "model.onnx" converting. error The error comes from the cross attention part. but the error disappears when i only covert the cross attention module to onnx and trt by ./trtexec --onnx=cross_attention.onnx --saveEngine=cross_attention.engine. so finally i can not figure out how to solve the problem to get correct trt result and open a issue for some help. Thanks~

Environment TensorRT Version: 8.4.1.5+cuda11.6 NVIDIA GPU: A100 NVIDIA Driver Version: 510.47.03 CUDA Version: 11.6 CUDNN Version: 8.4.0.27 Operating System: Ubuntu 20.04.2 LTS Python Version: 3.7.13 PyTorch Version: 1.10

Aug 15 '22 07:08 liangguixing95

Usually, this happened when your model has a dynamic input shape and a fixed reshape operation, can you check it first?

Aug 15 '22 12:08 zerollzeng

I got this same error. What do you want me to check? @zerollzeng Edit: I am training using the balloon example (idk where the link was anymore) and used their dataset and configurations.

Aug 16 '22 18:08 frankvp11

Check the onnx model first, e.g. run it with onnx runtime with a preset input shapes.

Aug 17 '22 09:08 zerollzeng

the problem here is simple, support you have a reshape layer, reshape a tensor to 2x6, it's has an input of axb, then axb must equal to 2x6=12

Aug 17 '22 09:08 zerollzeng

Yeah- I made another issue explaining my issue more closely, but I knew what you meant before already. Ill check it later with onnxruntime

Aug 17 '22 09:08 frankvp11

I've found out the reason which is related to the layer norm. In my model, the input of LN is a tensor of [900,1,256], the LN function is called by nn.functional.layer_norm(input, [256,]) , the output in the pytorch version has no problem but get a wrong output shape of [900,900,256] for onnx. I fixed the problem by revise the method into nn.functional.layer_norm(input, [1, 256]) . you can check if your code get the same problem @frankvp11

Aug 19 '22 06:08 liangguixing95

I've fixed the shape error but got another new problem. the outputs of onnx and trtfp32 engine are quite different after the torch.bmm operator in cross attention module. bmm I compare the output of q,k,attn of onnx and trt and print the max diff of each pair. q,k of them are the same, but attn are quite different. as show below. I have no idea to solve this. @zerollzeng diff2

Aug 19 '22 07:08 liangguixing95

I'm working with Detectron2 so its impossible for me to realistically edit the source code.

Aug 19 '22 10:08 frankvp11

I compare the output of q,k,attn of onnx and trt and print the max diff of each pair. q,k of them are the same, but attn are quite different. as show below. I have no idea to solve this

Can you provide a reproduce so that I can check it on my side? I would prefer a minimal onnx model.

Aug 20 '22 11:08 zerollzeng

https://drive.google.com/drive/folders/13LGb4uCEzrLV4k1dRa9FBHPnrrAwXfSf?usp=sharing Hear are the onnx model and some debug inputs i used to preduce the diff comparison log.

Aug 22 '22 02:08 liangguixing95

I can't reproduce it using polygraphy, all output is matched:

[I] Accuracy Comparison | trt-runner-N0-08/22/22-15:50:44 vs. onnxrt-runner-N0-08/22/22-15:50:44
[I]     Comparing Output: '72' (dtype=float32, shape=(8, 900, 32)) with '72' (dtype=float32, shape=(8, 900, 32))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 72 | Stats: mean=-0.0027745, std-dev=0.1346, var=0.018118, median=-7.5492e-05, min=-0.53595 at (2, 16, 0), max=0.58039 at (2, 300, 21), avg-magnitude=0.10865
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 72 | Stats: mean=-0.0027745, std-dev=0.1346, var=0.018118, median=-7.5492e-05, min=-0.53595 at (2, 16, 0), max=0.58039 at (2, 300, 21), avg-magnitude=0.10865
[I]         Error Metrics: 72
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     Comparing Output: '73' (dtype=float32, shape=(8, 12000, 32)) with '73' (dtype=float32, shape=(8, 12000, 32))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 73 | Stats: mean=0.062328, std-dev=0.72619, var=0.52735, median=0.055339, min=-3.2914 at (3, 5027, 19), max=3.1621 at (1, 3771, 3), avg-magnitude=0.5761
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 73 | Stats: mean=0.062328, std-dev=0.72619, var=0.52735, median=0.055339, min=-3.2914 at (3, 5027, 19), max=3.1621 at (1, 3771, 3), avg-magnitude=0.5761
[I]         Error Metrics: 73
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     Comparing Output: '76' (dtype=float32, shape=(8, 900, 12000)) with '76' (dtype=float32, shape=(8, 900, 12000))
[I]     Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-08/22/22-15:50:44: 76 | Stats: mean=-0.24013, std-dev=0.44643, var=0.1993, median=-0.23786, min=-3.2709 at (2, 191, 11177), max=2.4214 at (1, 174, 3771), avg-magnitude=0.40642
[I]         onnxrt-runner-N0-08/22/22-15:50:44: 76 | Stats: mean=-0.24013, std-dev=0.44643, var=0.1993, median=-0.23786, min=-3.2709 at (2, 191, 11177), max=2.4214 at (1, 174, 3771), avg-magnitude=0.40642
[I]         Error Metrics: 76
[I]             Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]             Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0), max=0 at (0, 0, 0), avg-magnitude=0
[I]         PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['72', '73', '76']
[I] PASSED | Command: /usr/local/bin/polygraphy run module.onnx --trt --onnxrt

Aug 22 '22 15:08 zerollzeng

A suggestion: after constant folding, the network structure is simpler:

polygraphy surgeon sanitize module.onnx --fold-constants -o module_folded.onnx

Aug 22 '22 15:08 zerollzeng

@zerollzeng does constant folding make the model better/faster?

Aug 23 '22 12:08 frankvp11

@zerollzeng does constant folding make the model better/faster? Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by polygraphy shows there may be no problem. But when using the real data, the max diff of the outputs are quite large as the log show above.

Aug 24 '22 08:08 liangguixing95

Constant folding brings some performance degradation for my case. The onnx file provided is a minimal part of the cross attention module in my model. Running the onnx by polygraphy shows there may be no problem. But when using the real data, the max diff of the outputs are quite large as the log show above.

Are you using the real data for input? it might be caused by your input data, e.g. if you feed random binary data to it, it might be large value like e+6

Aug 24 '22 16:08 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

Dec 06 '22 02:12 ttyio

Use NGC pytorch:22.12-py3 instead of pytorch:22.07-py3 to fix “Error Code 4: Miscellaneous (IShuffleLayer Reshape_179: reshape changes volume. Reshaping [784] to [1])"

Jun 29 '23 14:06 fanchuanster

I also come across this problem

[05/11/2024-15:07:32] [V] [TRT] Insert CopyNode after ConstantNode that produces a Myelin graph output: 25021
[05/11/2024-15:07:33] [E] Error[4]: [shapeCompiler.cpp::evaluateShapeChecks::1180] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: IShuffleLayer Reshape_1933: reshaping failed for tensor: 3516 Reshape would change volume.)
[05/11/2024-15:07:33] [E] Error[2]: [builder.cpp::buildSerializedNetwork::743] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[05/11/2024-15:07:33] [E] Engine could not be created from network
[05/11/2024-15:07:33] [E] Building engine failed
[05/11/2024-15:07:33] [E] Failed to create engine from model or file.
[05/11/2024-15:07:33] [E] Engine set up failed

the onnx's input all are fixed shape, but inner network has data-dependent op like nonzero, if I replace all code related to data-dependent operations with plugins for implementation, the errors will not occur.

May 11 '24 07:05 lix19937

TensorRT TensorRT copied to clipboard

Error Code 4: Miscellaneous (IShuffleLayer Reshape_427: reshape changes volume. Reshaping [900,1,256] to [900,7200,32].)

TensorRT
TensorRT copied to clipboard