TensorRT polygraphy error

Description

I use polygraphy to debug onnx

polygraphy run model.onnx --trt --onnxrt --trt-outputs mark all --onnx-outputs mark all --tactic-sources CUBLAS --fp16 --atol 1e-3 --rtol 1e-3 --val-range [0,1]

get the error

[07/29/2022-16:38:50] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/29/2022-16:38:50] [TRT] [W] Weights [name=onnx::MatMul_3172 + (Unnamed Layer* 330) [Shuffle]] had the following issues when converted to FP16:
[07/29/2022-16:38:50] [TRT] [W]  - Subnormal FP16 values detected. 
[07/29/2022-16:38:50] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[07/29/2022-16:38:50] [TRT] [W] Skipping tactic 0x0000000000000000 due to Myelin error: Formal output tensor "encoder_cross_views_0_cross_attend_mlp_0_bias _ (Unnamed Layer_ 316) [Shuffle]_constant" is also a data tensor.
[07/29/2022-16:38:50] [TRT] [E] 10: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[q...Reshape_980]}.)
[07/29/2022-16:38:50] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

but when i use trtexec to convert onnx to trt, it while succeed

Environment

TensorRT Version: 8.4.1.5 NVIDIA GPU: 3090 NVIDIA Driver Version: 470.74 CUDA Version: 11.4 CUDNN Version: Operating System: ubuntu18.04 Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

Jul 29 '22 09:07 duanchengwen

Can you try marking fewer outputs?

Jul 29 '22 13:07 pranavm-nvidia

您可以尝试标记更少的输出吗？

i removed mark all, also failed

polygraphy run model.onnx --trt --onnxrt --tactic-sources CUBLAS --fp16 --atol 1e-3 --rtol 1e-3 --val-range [0,1]

below is ouput

[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:39] [TRT] [W] Weights [name=onnx::MatMul_3171 + (Unnamed Layer* 313) [Shuffle]] had the following issues when converted to FP16:
[08/01/2022-09:43:39] [TRT] [W]  - Subnormal FP16 values detected. 
[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:39] [TRT] [W] Weights [name=onnx::MatMul_3172 + (Unnamed Layer* 330) [Shuffle]] had the following issues when converted to FP16:
[08/01/2022-09:43:39] [TRT] [W]  - Subnormal FP16 values detected. 
[08/01/2022-09:43:39] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.
[08/01/2022-09:43:46] [TRT] [W] Skipping tactic 0 due to insufficient memory on requested size of 444487680 detected for tactic 0x0000000000000000.
[08/01/2022-09:43:46] [TRT] [W] cuDNN, cuBLAS or cuBLASLt library is still required on networks with loop, boolean operators or transformer based architectures even if it is disabled through TacticSources APIs.
[08/01/2022-09:43:53] [TRT] [W] Skipping tactic 0 due to insufficient memory on requested size of 444487680 detected for tactic 0x0000000000000000.
[08/01/2022-09:43:53] [TRT] [E] 10: [optimizer.cpp::computeCosts::3628] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[q...Reshape_973 + Transpose_974 + Reshape_980]}.)
[08/01/2022-09:43:53] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[!] Invalid Engine. Please ensure the engine was built correctly

Aug 01 '22 01:08 duanchengwen

i met the same error

Aug 01 '22 08:08 oreo-lp

Can you try increasing the workspace size? e.g. --pool-limit workspace:1G

Aug 01 '22 13:08 pranavm-nvidia

Can you try increasing the workspace size? e.g. --pool-limit workspace:1G I set --pool-limit workspace:2G or --pool-limit workspace:20G, but it did not work

Aug 01 '22 13:08 oreo-lp

Can you try increasing the workspace size? e.g. --pool-limit workspace:1G

my network is transformer, when runpolygraphy run ./transformer.onnx --trt, it's ok. but run polygraphy run ./transformer.onnx --trt-outputs mark all --pool-limit workspace:2G, not work.

Aug 02 '22 01:08 oreo-lp

Can you try increasing the workspace size? e.g. --pool-limit workspace:1G

When I extract the onnx model, and only contain "input_mask -> unsqueeze" ops, the same error arose. Maybe polygraphy tools doesnt support "unsqueeze" op?

Aug 02 '22 03:08 oreo-lp

Can you try increasing the workspace size? e.g. --pool-limit workspace:1G

when I set --trt-outputs mark all, the same error arose, but set --trt-outputs is some layer id, e.g. --trt-outputs 10, it's ok. What's the reason for this bug?

Aug 02 '22 08:08 oreo-lp

It sounds like a TensorRT bug. Can you share the extracted model where you're seeing this?

Aug 02 '22 13:08 pranavm-nvidia

It sounds like a TensorRT bug. Can you share the extracted model where you're seeing this?

The transformer of wav2vec2, https://github.com/facebookresearch/fairseq

Aug 03 '22 01:08 oreo-lp

这听起来像是一个 TensorRT 错误。你能分享你看到这个的提取模型吗？

here is the model https://drive.google.com/file/d/1Uva7yyq9f9AQ406QzVJrJRVr8M9BykYR/view?usp=sharing

Aug 03 '22 06:08 duanchengwen

Thanks, I've filed an internal issue to track this (internal id: 3742810). In the meantime, you should be able to work around this by either marking a specific set of outputs (instead of mark all) and increasing the workspace size or using polygraphy debug reduce (see this example).

Aug 03 '22 13:08 pranavm-nvidia

Thanks, I've filed an internal issue to track this (internal id: 3742810). In the meantime, you should be able to work around this by either marking a specific set of outputs (instead of mark all) and increasing the workspace size or using polygraphy debug reduce (see this example).

I have the same error; and I want to whether there is precision error in every layer,it seems mark all is necessary for me.how can I using polygraphy debug reduce to mark all layer that exists precision error.

Aug 04 '22 10:08 xiongda777

debug reduce will remove layers from the model until you're left with a minimal model that reproduces the failure. So it won't show you the precision loss at each layer, but if you're seeing poor accuracy, it would at least pinpoint which part of the graph is causing it.

Does marking a smaller set of outputs not help in your case? Seems to me like that still has value even if you can't see the outputs of each layer.

Aug 04 '22 13:08 pranavm-nvidia

Yes, it work. However, in a very large model, it seems that it can only locate the nearest wrong layer . I need to constantly modify the onnx file, and then repeat the process. Converting a model is really a very big project.

Aug 04 '22 13:08 xiongda777

Could you explain why you need to modify the ONNX file? You should be able to do, e.g. --trt-outputs <output_name_0> <output_name_1> ... <output_name_N> and similarly for --onnx-outputs.

Aug 04 '22 13:08 pranavm-nvidia

 
    Node 614  | Relu_1859 [Op: Relu]
        {input.563 [dtype=float32, shape=()]}
         -> {onnx::Conv_2848 [dtype=float32, shape=()]}
    
    Node 615  | Conv_1860 [Op: Conv]
        {onnx::Conv_2848 [dtype=float32, shape=()],
         Initializer | to_logits.3.weight [dtype=float32, shape=(2, 64, 1, 1)],
         Initializer | to_logits.3.bias [dtype=float32, shape=(2,)]}
         -> {z [dtype=float32, shape=()]}
    
    Node 616  | Slice_1865 [Op: Slice]
        {z [dtype=float32, shape=()],
         Initializer | onnx::Unsqueeze_987 [dtype=int64, shape=(1,)],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)]}
         -> {bev [dtype=float32, shape=()]}
    
    Node 617  | Slice_1870 [Op: Slice]
        {z [dtype=float32, shape=()],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
         Initializer | onnx::Slice_2769 [dtype=int64, shape=(1,)],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)],
         Initializer | onnx::Concat_3089 [dtype=int64, shape=(1,)]}
         -> {center [dtype=float32, shape=()]}

I inspect my folded.onnx,there are about 600 node in it.it seems too large to input.By the way,I want to konw polygraphy_debug_replay.json what this file mean. this is my file.

{ "_N0_outputs": [ false, [ 192 ] ], "_N1_outputs": [ false, [ 155 ] ], "_N2_outputs": [ false, [ 78 ] ], "_N3_outputs": [ false, [ 39 ] ], "_N4_outputs": [ false, [ 20 ] ], "_N5_outputs": [ false, [ 10 ] ], "_N6_outputs": [ false, [ 5 ] ], "_N7_outputs": [ false, [ 3 ] ], "_N8_outputs": [ false, [ 2 ] ], "_N0_inputs": [ false, [ 1 ] ] }

Aug 04 '22 13:08 xiongda777

It allows you to resume from wherever polygraphy debug left off. e.g. you can do polygraphy debug reduce --load-debug-replay polygraphy_debug_replay.json ... and it would skip ahead to the last iteration in the replay file. You can refer to the help output for details

Aug 04 '22 14:08 pranavm-nvidia

Thank you for your work. I still have doubts. If there is more than one error in my model, final_ reduced. It seems that final_ reduced.onnx can only locate one mistake for me. How can I keep all the wrong layers in the iteration?

Aug 04 '22 14:08 xiongda777

That's a little tricky to do because the intended usage of debug reduce was to iteratively fix bugs. That is, you would run debug reduce to create a minimal reproducer, fix the bug, then run it again on the original model to create a reproducer for the next bug. Since the common case is a small number bugs (usually just one in my experience), this typically works well.

For what you're trying to do, debug reduce would need to explore the remainder of the graph after finding a minimal failing subgraph. You could do something like that with manual effort - e.g. if we consider a simple example graph like:

A -> B -> C- > D -> E

where layers B, C, and D all have separate errors, then debug reduce might initially find a model containing B. After that, you could use surgeon extract to extract the C -> D -> E part of the model and re-run debug reduce on that. Then repeat for the D -> E portion.

Another thing to explore would be the --artifacts option, although that also wouldn't give you exactly what you want. If you did:

polygraphy debug reduce --artifacts polygraphy_debug.onnx ...

This would generate two directories - polygraphy_artifacts/good and polygraphy_artifacts/bad, containing the passing and failing subgraphs from the entire reduction process. You could manually compare all the subgraphs to guess which layers might be problematic.

Hope that helps

Aug 04 '22 15:08 pranavm-nvidia

@pranavm-nvidia I will try it. Thank you for your reply。

Aug 04 '22 15:08 xiongda777

Hi, there is a question, how can I compare every outputs of two engines by polygraphy? e.g. fp16_engine and fp32_engine?

Aug 05 '22 01:08 oreo-lp

The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:

polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.json

then:

polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json

Aug 05 '22 14:08 pranavm-nvidia

The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:
polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.json
then:
polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json

Thanks. How to compare precision of the outputs of two engines? By setting --atol and --rtol? For example, polygraphy run fp32_engine.txt --trt --model-type engine --save-outputs fp32_outputs.json and polygraphy run fp16_engine.txt --trt --model-type engine --load-outputs fp32_outputs.json --atol 0.1 --rtol 0.1 right? Can this method find which layer of engine_16 exceed the abs or rel tolerance?

Aug 05 '22 15:08 oreo-lp

Yes, exactly

Aug 05 '22 15:08 pranavm-nvidia

Yes, exactly

Thanks!

Aug 05 '22 15:08 oreo-lp

The outputs of the engine can't be changed once it's built, so you'd need to mark whichever outputs you need while building. After that, you could do:
polygraphy run fp16_engine --trt --model-type engine --save-outputs fp16_outputs.json
then:
polygraphy run fp32_engine --trt --model-type engine --load-outputs fp16_outputs.json

Hi, this method can only get the final outputs data of engines. Now I want to find out which layer exceed tolerance, how can I get the outputs of every layers, or there is other method to compare the every layer's outputs of the two engines?

Aug 06 '22 05:08 oreo-lp

You'd need to mark the outputs you want to compare when you build the engine. As I mentioned, there's no way to retrieve values for tensors which weren't marked as outputs while building the engine. If you're building the engine with Polygraphy, you can use the normal --trt-outputs ... option. If you're building with the API, you can do network.mark_output(...).

Aug 08 '22 13:08 pranavm-nvidia

Thanks

Aug 08 '22 13:08 oreo-lp

We have a fix in TensorRT, will be included in the next release.

Aug 30 '22 08:08 zhenhuaw-me

TensorRT TensorRT copied to clipboard

polygraphy error

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard