TensorRT
TensorRT copied to clipboard
polygraphy error
Description
I use polygraphy to debug onnx
polygraphy run model.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --tactic-sources CUBLAS --fp16 --atol 1e-3 --rtol 1e-3 --val-range [0,1]
get the error
[07/29/2022-16:38:50] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/29/2022-16:38:50] [TRT] [W] Weights [name=onnx::MatMul_3172 + (Unnamed Layer* 330) [Shuffle]] had the following issues when converted to FP16:
[07/29/2022-16:38:50] [TRT] [W] - Subnormal FP16 values detected.
[07/29/2022-16:38:50] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/29/2022-16:38:50] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: Formal output tensor "encoder_cross_views_0_cross_attend_mlp_0_bias _ (Unnamed Layer_ 316) [Shuffle]_constant" is also a data tensor.
[07/29/2022-16:38:50] [TRT] [E] 10: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[q...Reshape_980]}.)
[07/29/2022-16:38:50] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
but when i use trtexec to convert onnx to trt, it while succeed
Environment
TensorRT Version: 8.4.1.5 NVIDIA GPU: 3090 NVIDIA Driver Version: 470.74 CUDA Version: 11.4 CUDNN Version: Operating System: ubuntu18.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
Can you try marking fewer outputs?
您可以尝试标记更少的输出吗?
i removed mark all, also failed
polygraphy run model.onnx --trt --onnxrt --tactic-sources CUBLAS --fp16 --atol 1e-3 --rtol 1e-3 --val-range [0,1]
below is ouput
[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:39] [TRT] [W] Weights [name=onnx::MatMul_3171 + (Unnamed Layer* 313) [Shuffle]] had the following issues when converted to FP16:
[08/01/2022-09:43:39] [TRT] [W] - Subnormal FP16 values detected.
[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:39] [TRT] [W] Weights [name=onnx::MatMul_3172 + (Unnamed Layer* 330) [Shuffle]] had the following issues when converted to FP16:
[08/01/2022-09:43:39] [TRT] [W] - Subnormal FP16 values detected.
[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:46] [TRT] [W] Skipping tactic 0 due to insufficient memory on requested size of 444487680 detected for tactic 0x0000000000000000.
[08/01/2022-09:43:46] [TRT] [W] cuDNN, cuBLAS or cuBLASLt library is still required on networks with loop, boolean operators or transformer based architectures even if it is disabled through TacticSources APIs.
[08/01/2022-09:43:53] [TRT] [W] Skipping tactic 0 due to insufficient memory on requested size of 444487680 detected for tactic 0x0000000000000000.
[08/01/2022-09:43:53] [TRT] [E] 10: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[q...Reshape_973 + Transpose_974 + Reshape_980]}.)
[08/01/2022-09:43:53] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly
i met the same error
Can you try increasing the workspace size? e.g. --pool-limit workspace:1G
Can you try increasing the workspace size? e.g.
--pool-limit workspace:1GI set--pool-limit workspace:2Gor--pool-limit workspace:20G, but it did not work
Can you try increasing the workspace size? e.g.
--pool-limit workspace:1G
my network is transformer, when runpolygraphy run ./transformer.onnx --trt, it's ok. but run polygraphy run ./transformer.onnx --trt-outputs mark all --pool-limit workspace:2G, not work.
Can you try increasing the workspace size? e.g.
--pool-limit workspace:1G
When I extract the onnx model, and only contain "input_mask -> unsqueeze" ops, the same error arose. Maybe polygraphy tools doesnt support "unsqueeze" op?
Can you try increasing the workspace size? e.g.
--pool-limit workspace:1G
when I set --trt-outputs mark all, the same error arose, but set --trt-outputs is some layer id, e.g. --trt-outputs 10, it's ok. What's the reason for this bug?
It sounds like a TensorRT bug. Can you share the extracted model where you're seeing this?
It sounds like a TensorRT bug. Can you share the extracted model where you're seeing this?
The transformer of wav2vec2, https://github.com/facebookresearch/fairseq
这听起来像是一个 TensorRT 错误。你能分享你看到这个的提取模型吗?
here is the model https://drive.google.com/file/d/1Uva7yyq9f9AQ406QzVJrJRVr8M9BykYR/view?usp=sharing
Thanks, I've filed an internal issue to track this (internal id: 3742810). In the meantime, you should be able to work around this by either marking a specific set of outputs (instead of mark all) and increasing the workspace size or using polygraphy debug reduce (see this example).
Thanks, I've filed an internal issue to track this (internal id: 3742810). In the meantime, you should be able to work around this by either marking a specific set of outputs (instead of
mark all) and increasing the workspace size or usingpolygraphy debug reduce(see this example).
I have the same error; and I want to whether there is precision error in every layer,it seems mark all is necessary for me.how can I using polygraphy debug reduce to mark all layer that exists precision error.
debug reduce will remove layers from the model until you're left with a minimal model that reproduces the failure. So it won't show you the precision loss at each layer, but if you're seeing poor accuracy, it would at least pinpoint which part of the graph is causing it.
Does marking a smaller set of outputs not help in your case? Seems to me like that still has value even if you can't see the outputs of each layer.
Yes, it work. However, in a very large model, it seems that it can only locate the nearest wrong layer . I need to constantly modify the onnx file, and then repeat the process. Converting a model is really a very big project.
Could you explain why you need to modify the ONNX file? You should be able to do, e.g. --trt-outputs <output_name_0> <output_name_1> ... <output_name_N> and similarly for --onnx-outputs.
Node 614 | Relu_1859 [Op: Relu]
{input.563 [dtype=float32, shape=()]}
-> {onnx::Conv_2848 [dtype=float32, shape=()]}
Node 615 | Conv_1860 [Op: Conv]
{onnx::Conv_2848 [dtype=float32, shape=()],
Initializer | to_logits.3.weight [dtype=float32, shape=(2, 64, 1, 1)],
Initializer | to_logits.3.bias [dtype=float32, shape=(2,)]}
-> {z [dtype=float32, shape=()]}
Node 616 | Slice_1865 [Op: Slice]
{z [dtype=float32, shape=()],
Initializer | onnx::Unsqueeze_987 [dtype=int64, shape=(1,)],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)]}
-> {bev [dtype=float32, shape=()]}
Node 617 | Slice_1870 [Op: Slice]
{z [dtype=float32, shape=()],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
Initializer | onnx::Slice_2769 [dtype=int64, shape=(1,)],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)]}
-> {center [dtype=float32, shape=()]}
I inspect my folded.onnx,there are about 600 node in it.it seems too large to input.By the way,I want to konw polygraphy_debug_replay.json what this file mean. this is my file.
{ "_N0_outputs": [ false, [ 192 ] ], "_N1_outputs": [ false, [ 155 ] ], "_N2_outputs": [ false, [ 78 ] ], "_N3_outputs": [ false, [ 39 ] ], "_N4_outputs": [ false, [ 20 ] ], "_N5_outputs": [ false, [ 10 ] ], "_N6_outputs": [ false, [ 5 ] ], "_N7_outputs": [ false, [ 3 ] ], "_N8_outputs": [ false, [ 2 ] ], "_N0_inputs": [ false, [ 1 ] ] }
It allows you to resume from wherever polygraphy debug left off. e.g. you can do polygraphy debug reduce --load-debug-replay polygraphy_debug_replay.json ... and it would skip ahead to the last iteration in the replay file. You can refer to the help output for details
Thank you for your work. I still have doubts. If there is more than one error in my model, final_ reduced. It seems that final_ reduced.onnx can only locate one mistake for me. How can I keep all the wrong layers in the iteration?
That's a little tricky to do because the intended usage of debug reduce was to iteratively fix bugs. That is, you would run debug reduce to create a minimal reproducer, fix the bug, then run it again on the original model to create a reproducer for the next bug. Since the common case is a small number bugs (usually just one in my experience), this typically works well.
For what you're trying to do, debug reduce would need to explore the remainder of the graph after finding a minimal failing subgraph. You could do something like that with manual effort - e.g. if we consider a simple example graph like:
A -> B -> C- > D -> E
where layers B, C, and D all have separate errors, then debug reduce might initially find a model containing B. After that, you could use surgeon extract to extract the C -> D -> E part of the model and re-run debug reduce on that. Then repeat for the D -> E portion.
Another thing to explore would be the --artifacts option, although that also wouldn't give you exactly what you want.
If you did:
polygraphy debug reduce --artifacts polygraphy_debug.onnx ...
This would generate two directories - polygraphy_artifacts/good and polygraphy_artifacts/bad, containing the passing and failing subgraphs from the entire reduction process. You could manually compare all the subgraphs to guess which layers might be problematic.
Hope that helps
@pranavm-nvidia I will try it. Thank you for your reply。
Hi, there is a question, how can I compare every outputs of two engines by polygraphy? e.g. fp16_engine and fp32_engine?
The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:
polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.json
then:
polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json
The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:
polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.jsonthen:
polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json
Thanks. How to compare precision of the outputs of two engines? By setting --atol and --rtol?
For example, polygraphy run fp32_engine.txt --trt --model-type engine --save-outputs fp32_outputs.json and polygraphy run fp16_engine.txt --trt --model-type engine --load-outputs fp32_outputs.json --atol 0.1 --rtol 0.1 right? Can this method find which layer of engine_16 exceed the abs or rel tolerance?
Yes, exactly
Yes, exactly
Thanks!
The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:
polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.jsonthen:
polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json
Hi, this method can only get the final outputs data of engines. Now I want to find out which layer exceed tolerance, how can I get the outputs of every layers, or there is other method to compare the every layer's outputs of the two engines?
You'd need to mark the outputs you want to compare when you build the engine. As I mentioned, there's no way to retrieve values for tensors which weren't marked as outputs while building the engine. If you're building the engine with Polygraphy, you can use the normal --trt-outputs ... option. If you're building with the API, you can do network.mark_output(...).
Thanks
We have a fix in TensorRT, will be included in the next release.