TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Set layer precision still doesn't take effect in TensorRT 8.6.1.

Open YouSenRong opened this issue 2 years ago • 21 comments

Description

As I had reflected in this Skipping tactic 0x0000000000000000 due to Myelin error" degrade performance.,set layer precision may failed in TensorRT 8.4.3 due to the ConstShuffleFusion.

In these days, I try TensorRT 8.6.1, but it seems that setting layer precision may still fail due to the ConstShuffleFusion. For example, as show in the graph, the Max op take a const input named "phase0_tf/predict_node/y:0", and the value seem to be fp16 subnormal, so I use set_precision api to set the layer ("phase0_tf/predict_node/y:0") to fp32 explicitly. image

The verbose logs are as follows: image

When the fp16 subnormal is not set to fp32, the log are as follows, the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is fp16 precision: image image

However, when the fp16 subnormal is set to fp32, the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is still fp16. image

By the way, the ConstShuffleFusion produce two kind of layer, such as image image

image image

I am confused the differences. Is that the reason set_precision fails of the layer "phase0_tf/predict_node/y:0"?

Looking forward to your reply. Thanks a lot!

Environment

TensorRT Version: 8.6.1

NVIDIA GPU: T4

NVIDIA Driver Version: 510

CUDA Version:12.0

CUDNN Version:

Operating System: Ubuntu20.04

Python Version (if applicable):

Tensorflow Version (if applicable): 1.4

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

YouSenRong avatar Aug 16 '23 11:08 YouSenRong

How do you set the layer precision? did you set the layer contrain to obey? See https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#abdc74c40fe7a0c3d05d2caeccfbc29c1

zerollzeng avatar Aug 17 '23 14:08 zerollzeng

Tanks for you reply! @zerollzeng

How do you set the layer precision?

I set the precision by calling the setPrecision of a layer, as image

did you set the layer contrain to obey?

Yes, I had set the BuilderFlag::kOBEY_PRECISION_CONSTRAINTS, as: image However, it still doesn't work.

For the other layers, the setPrecision work. Only the setPrecision of "phase0_tf/predict_node/y:0" layer doesn't take effect.

YouSenRong avatar Aug 18 '23 00:08 YouSenRong

Could you please provide a reproduce for us? thanks!

I would prefer an onnx model that can reproduce this error.

zerollzeng avatar Aug 18 '23 14:08 zerollzeng

Sorry for the late response. @zerollzeng I have split a subgraph of the model subgraph.onnx.zip, But I can‘t reproduce the error on the subgraph. However, I can reproduce the error on the full model. I run the subgraph and the full model with trtexec based on TensorRT 8.6 with command: ./trtexec --onnx=subgraph.onnx --fp16 --verbose --builderOptimizationLevel=3 --layerPrecisions="phase0_tf/predict_node/y:0:fp32,phase0_tf/predict_node:fp32" --layerOutputTypes="phase0_tf/predict_node/y:0:fp32" --precisionConstraints="obey" > subgraph.log 2>&1

./trtexec --onnx=full_model.onnx --fp16 --verbose --builderOptimizationLevel=3 --layerPrecisions="phase0_tf/predict_node/y:0:fp32,phase0_tf/predict_node:fp32" --layerOutputTypes="phase0_tf/predict_node/y:0:fp32" --precisionConstraints="obey" > full_model.log 2>&1

And the log are show as follows: image It seems that the utilized tactics are different between the subgraph and full model.

Besides, I had set the "phase0_tf/predict_node/y:0" and "phase0_tf/predict_node" to fp32, but the warning message still shows that the layer "phase0_tf/predict_node/y:0 + (Unnamed Layer* 522) [Shuffle]" is fp16 subnormal. image image

For the full model, maybe I have to ask for the agreement to share. Or can I send the full model to you privately instead of publicly on github?

YouSenRong avatar Aug 22 '23 07:08 YouSenRong

@nvpohanh On the right part of the image, it's a myelin subgraph, is it possible that myelin already set the precision to FP32 but just didn't print it in the log? image

zerollzeng avatar Aug 23 '23 12:08 zerollzeng

is it possible that myelin already set the precision to FP32 but just didn't print it in the log?

It can't account for the large difference between pure FP32 and mixed FP32&FP16.

YouSenRong avatar Aug 24 '23 00:08 YouSenRong

Several things I would try:

  1. set the precision of the Concat op before the Max op to FP32 and also set the Concat's output dtype (using set_output_type()) to FP32.
  2. If that doesn't work, add a "Cast" op before Max to cast the Concat's output to FP32, before feeding into Max.

On the right part of the image, it's a myelin subgraph, is it possible that myelin already set the precision to FP32 but just didn't print it in the log?

If the ForeignNode optimization is triggered, we do not have information about the detailed dtype info. We will need to use Nsys to look at it (or use --dumpLayerInfo --profilingVerbosity=detailed with latest TRT internal build).

I think the first thing we should do is to repro the accuracy difference between pure-FP32 and FP32+FP16.

nvpohanh avatar Aug 24 '23 03:08 nvpohanh

@nvpohanh Do you need the full model to reproduce the error?

YouSenRong avatar Aug 25 '23 09:08 YouSenRong

Probably don't need the full model, but need a way to repro the "large difference between pure FP32 and mixed FP32&FP16" you mentioned.

nvpohanh avatar Aug 25 '23 11:08 nvpohanh

Based TensorRT 8.6, the diff between shown as follows. absolute difference: min: 9.02219e-10 (0.000139833, 0.000139832), max: 0.00138001 (0.0436334, 0.0450134), mean: 9.89399e-06 relative difference: min: 5.52027e-06 (0.0022354, 0.00223541), max: 0.141119 (8.68643e-05, 9.91225e-05), mean: 0.00445263 The max relative difference is about 0.14. Base TensorRT 8.4.3, the max relative difference between FP32 and mixed FP32+FP16 is only about 0.01.

For the repro, I try to save the input data.

YouSenRong avatar Aug 27 '23 07:08 YouSenRong

A similar issue: https://github.com/NVIDIA/TensorRT/issues/3257

zerollzeng avatar Aug 27 '23 13:08 zerollzeng

@zerollzeng is this dup of #3257 ? thanks

ttyio avatar Oct 10 '23 20:10 ttyio

@zerollzeng is this dup of #3257 ? thanks

Maybe not.

absolute difference: min: 9.02219e-10 (0.000139833, 0.000139832), max: 0.00138001 (0.0436334, 0.0450134), mean: 9.89399e-06 relative difference: min: 5.52027e-06 (0.0022354, 0.00223541), max: 0.141119 (8.68643e-05, 9.91225e-05), mean: 0.00445263 The max relative difference is about 0.14. Base TensorRT 8.4.3, the max relative difference between FP32 and mixed FP32+FP16 is only about 0.01.

The diff doesn't look very big in both case. what is the out data range?

zerollzeng avatar Oct 12 '23 14:10 zerollzeng

what is the out data range?

What does the out data range mean? I had tried both "enqueueV2" and "enqueueV3" api, but all the results have big diffs. I am organizing the details.

YouSenRong avatar Oct 13 '23 01:10 YouSenRong

output data range. e.g if the range is [-1, 1], then the diff(max 0.001) look good to me.

zerollzeng avatar Oct 18 '23 14:10 zerollzeng

1: The data range is [0, 1]. But the relative difference is too big, and I thinks it is caused by that the setting layer precision to FP32 doesn't take effect. Besides, not always max 0.001, some time bigger. For some case where setting layer precision can take effect, the diff is small. But in some case where setting layer precision doesn't take effect, the diff is big.

YouSenRong avatar Oct 18 '23 14:10 YouSenRong

But the relative difference is too big

if trt output has value 0.000001 and the onnx output has value 0.000002. then you will see the a relative difference of 1. Have you try Po-Han's suggestion to set the layer precision?

zerollzeng avatar Oct 20 '23 14:10 zerollzeng

if trt output has value 0.000001 and the onnx output has value 0.000002. then you will see the a relative difference of 1.

Yes, I understand, but the absolute diff is not so small.

Have you try Po-Han's suggestion to set the layer precision?

Yes, I set the layer precision, but it still doesn't take effect as Po-Han's suggestion.

YouSenRong avatar Oct 21 '23 02:10 YouSenRong

Okay, I think we need a reproduce to debug this issue further.

zerollzeng avatar Oct 22 '23 03:10 zerollzeng

Take the result of TensorFlow(int FP32) as criterion, compared to TRT8.4, TRT8.6, TRT9.1 (in FP16), 10 samples are as follows:

TF(FP32) vs TRT8.4(FP16) diff_tf_trt8.4.txt

TF(FP32) vs TRT8.6(FP16) diff_tf_trt8.6.txt

TF(FP32) vs TRT9.1(FP16) diff_tf_trt9.1.txt

From these data, it shows that the diff of trt8.6 and trt9.1 is bigger.

Besides, with set_precision, the diff of TF(fp32) and TRT8.4(FP16) can be reduce: diff_tf_trt8.4-set_precision.txt But the set_precision doesn't make sense in TRT8.6 and TRT9.1.

YouSenRong avatar Nov 07 '23 04:11 YouSenRong

Okay, I think we need a reproduce to debug this issue further.

@zerollzeng We’ve recently encountered a nearly identical issue in our model deployment.

If possible, could you share any directions you’ve identified? Thanks.

xjy1995 avatar Apr 15 '25 13:04 xjy1995