TensorRT
TensorRT copied to clipboard
FP16 model of TensorRT 10.0 are incorrect when running on GPU T4
Description
Using version 10 of TensorRT's trtexec to convert an ONNX model to a TensorRT model, the results of the FP32 model are correct, but the results of the FP16 model are incorrect. I have set almost all layers to FP32 using trtexec --precisionConstraints=obey --builderOptimizationLevel=5 --layerPrecisions="/Transpose":fp32,"/intro_/Conv":fp32,"/intro_down/Conv":fp32,......., but the results are still incorrect. Could you help me solve this problem?
Environment
TensorRT Version: TensorRT 10.0.1
NVIDIA GPU: Tesla T4
NVIDIA Driver Version: 450.36.06
CUDA Version: 11.0
CUDNN Version:8.0.0
Operating System:
onnx opset17
Relevant Files
onnx Model link: https://drive.google.com/file/d/14zuubyXVVN-mOJ2b64jPc128dj4VRU_C/view?usp=sharing
Steps To Reproduce
trtexec --onnx=$pr_nolog_model_path --fp16 --device=0 --minShapes=input:1x128x128x3 --optShapes=input:1x1920x1920x3 --maxShapes=input:1x3072x3072x3 --saveEngine=ysDeblur_cc75_t4_fp16_small_dyn.trtmodel --layerPrecisions="/Transpose":fp32,"/intro_/Conv":fp32,"/intro_down/Conv":fp32,(so many) --precisionConstraints=obey --builderOptimizationLevel=5
Try to use follow cmd
polygraphy run xxxx.onnx --trt --onnxrt --fp16 \
--trt-outputs mark all \
--onnx-outputs mark all
log_netg.txt Here are the results. @lix19937
Hi, sorry to bother you, but is there any update on the solution? @lix19937
@yflv-yanxia Sorry late to reply, from my build log
[08/20/2024-20:57:50] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[08/20/2024-20:57:50] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[08/20/2024-20:57:50] [W] [TRT] Check verbose logs for the list of affected weights.
[08/20/2024-20:57:50] [W] [TRT] - 111 weights are affected by this issue: Detected subnormal FP16 values.
[08/20/2024-20:57:50] [V] [TRT] List of affected weights: /decoders.0/decoders.0.0/conv1/Conv.weight, /decoders.0/decoders.0.0/conv2/Conv.weight, /decoders.0/decoders.0.0/conv3/Conv + decoders.0.0.beta + /decoders.0/decoders.0.0/Mul_1 + /decoders.0/decoders.0.0/Add.weight, /decoders.0/decoders.0.0/conv4/Conv.weight, /decoders.0/decoders.0.0/conv5/Conv + decoders.0.0.gamma + /decoders.0/decoders.0.0/Mul_2 + /decoders.0/decoders.0.0/Add_1.bias, /decoders.0/decoders.0.0/conv5/Conv + decoders.0.0.gamma + /decoders.0/decoders.0.0/Mul_2 + /decoders.0/decoders.0.0/Add_1.weight, /decoders.0/decoders.0.0/sca/sca.1/Conv.weight, /decoders.1/decoders.1.0/conv1/Conv.weight, /decoders.1/decoders.1.0/conv2/Conv.weight, /decoders.1/decoders.1.0/conv3/Conv + decoders.1.0.beta + /decoders.1/decoders.1.0/Mul_1 + /decoders.1/decoders.1.0/Add.weight, /decoders.1/decoders.1.0/conv4/Conv.weight, /decoders.1/decoders.1.0/conv5/Conv + decoders.1.0.gamma + /decoders.1/decoders.1.0/Mul_2 + /decoders.1/decoders.1.0/Add_1.weight, /decoders.1/decoders.1.0/sca/sca.1/Conv.weight, /decoders.2/decoders.2.0/conv2/Conv.weight, /decoders.2/decoders.2.0/conv3/Conv + decoders.2.0.beta + /decoders.2/decoders.2.0/Mul_1 + /decoders.2/decoders.2.0/Add.weight, /decoders.2/decoders.2.0/conv5/Conv + decoders.2.0.gamma + /decoders.2/decoders.2.0/Mul_2 + /decoders.2/decoders.2.0/Add_1.weight, /decoders.3/decoders.3.0/conv4/Conv.weight, /decoders.3/decoders.3.0/conv5/Conv + decoders.3.0.gamma + /decoders.3/decoders.3.0/Mul_2 + /decoders.3/decoders.3.0/Add_1.weight, /downs.0/Conv.weight, /downs.1/Conv.weight, /downs.2/Conv.weight, /downs.3/Conv.weight, /encoders.0/encoders.0.0/sca/sca.1/Conv.weight, /encoders.1/encoders.1.0/conv3/Conv + encoders.1.0.beta + /encoders.1/encoders.1.0/Mul_1 + /encoders.1/encoders.1.0/Add.weight, /encoders.2/encoders.2.0/conv1/Conv.weight, /encoders.2/encoders.2.0/conv2/Conv.bias, /encoders.2/encoders.2.0/conv3/Conv + encoders.2.0.beta + /encoders.2/encoders.2.0/Mul_1 + /encoders.2/encoders.2.0/Add.weight, /encoders.2/encoders.2.0/conv4/Conv.weight, /encoders.2/encoders.2.0/conv5/Conv + encoders.2.0.gamma + /encoders.2/encoders.2.0/Mul_2 + /encoders.2/encoders.2.0/Add_1.weight, /encoders.2/encoders.2.0/sca/sca.1/Conv.weight, /encoders.3/encoders.3.0/conv1/Conv.weight, /encoders.3/encoders.3.0/conv3/Conv + encoders.3.0.beta + /encoders.3/encoders.3.0/Mul_1 + /encoders.3/encoders.3.0/Add.weight, /encoders.3/encoders.3.0/conv4/Conv.weight, /encoders.3/encoders.3.0/conv5/Conv + encoders.3.0.gamma + /encoders.3/encoders.3.0/Mul_2 + /encoders.3/encoders.3.0/Add_1.bias, /encoders.3/encoders.3.0/conv5/Conv + encoders.3.0.gamma + /encoders.3/encoders.3.0/Mul_2 + /encoders.3/encoders.3.0/Add_1.weight, /encoders.3/encoders.3.0/sca/sca.1/Conv.weight, /encoders.3/encoders.3.1/conv1/Conv.weight, /encoders.3/encoders.3.1/conv2/Conv.weight, /encoders.3/encoders.3.1/conv3/Conv + encoders.3.1.beta + /encoders.3/encoders.3.1/Mul_1 + /encoders.3/encoders.3.1/Add.weight, /encoders.3/encoders.3.1/conv4/Conv.weight, /encoders.3/encoders.3.1/conv5/Conv + encoders.3.1.gamma + /encoders.3/encoders.3.1/Mul_2 + /encoders.3/encoders.3.1/Add_1.weight, /encoders.3/encoders.3.1/sca/sca.1/Conv.weight, /encoders.3/encoders.3.2/conv1/Conv.weight, /encoders.3/encoders.3.2/conv2/Conv.weight, /encoders.3/encoders.3.2/conv3/Conv + encoders.3.2.beta + /encoders.3/encoders.3.2/Mul_1 + /encoders.3/encoders.3.2/Add.weight, /encoders.3/encoders.3.2/conv4/Conv.weight, /encoders.3/encoders.3.2/conv5/Conv + encoders.3.2.gamma + /encoders.3/encoders.3.2/Mul_2 + /encoders.3/encoders.3.2/Add_1.weight, /encoders.3/encoders.3.2/sca/sca.1/Conv.weight, /encoders.3/encoders.3.3/conv1/Conv.bias, /encoders.3/encoders.3.3/conv1/Conv.weight, /encoders.3/encoders.3.3/conv2/Conv.weight, /encoders.3/encoders.3.3/conv3/Conv + encoders.3.3.beta + /encoders.3/encoders.3.3/Mul_1 + /encoders.3/encoders.3.3/Add.weight, /encoders.3/encoders.3.3/conv4/Conv.weight, /encoders.3/encoders.3.3/conv5/Conv + encoders.3.3.gamma + /encoders.3/encoders.3.3/Mul_2 + /encoders.3/encoders.3.3/Add_1.weight, /encoders.3/encoders.3.3/sca/sca.1/Conv.weight, /encoders.3/encoders.3.4/conv1/Conv.weight, /encoders.3/encoders.3.4/conv2/Conv.weight, /encoders.3/encoders.3.4/conv3/Conv + encoders.3.4.beta + /encoders.3/encoders.3.4/Mul_1 + /encoders.3/encoders.3.4/Add.weight, /encoders.3/encoders.3.4/conv4/Conv.weight, /encoders.3/encoders.3.4/conv5/Conv + encoders.3.4.gamma + /encoders.3/encoders.3.4/Mul_2 + /encoders.3/encoders.3.4/Add_1.weight, /encoders.3/encoders.3.4/sca/sca.1/Conv.weight, /encoders.3/encoders.3.5/conv1/Conv.weight, /encoders.3/encoders.3.5/conv2/Conv.weight, /encoders.3/encoders.3.5/conv3/Conv + encoders.3.5.beta + /encoders.3/encoders.3.5/Mul_1 + /encoders.3/encoders.3.5/Add.bias, /encoders.3/encoders.3.5/conv3/Conv + encoders.3.5.beta + /encoders.3/encoders.3.5/Mul_1 + /encoders.3/encoders.3.5/Add.weight, /encoders.3/encoders.3.5/conv4/Conv.bias, /encoders.3/encoders.3.5/conv4/Conv.weight, /encoders.3/encoders.3.5/conv5/Conv + encoders.3.5.gamma + /encoders.3/encoders.3.5/Mul_2 + /encoders.3/encoders.3.5/Add_1.weight, /encoders.3/encoders.3.5/sca/sca.1/Conv.weight, /encoders.3/encoders.3.6/conv1/Conv.weight, /encoders.3/encoders.3.6/conv2/Conv.weight, /encoders.3/encoders.3.6/conv3/Conv + encoders.3.6.beta + /encoders.3/encoders.3.6/Mul_1 + /encoders.3/encoders.3.6/Add.weight, /encoders.3/encoders.3.6/conv4/Conv.weight, /encoders.3/encoders.3.6/conv5/Conv + encoders.3.6.gamma + /encoders.3/encoders.3.6/Mul_2 + /encoders.3/encoders.3.6/Add_1.weight, /encoders.3/encoders.3.6/sca/sca.1/Conv.weight, /encoders.3/encoders.3.7/conv1/Conv.weight, /encoders.3/encoders.3.7/conv3/Conv + encoders.3.7.beta + /encoders.3/encoders.3.7/Mul_1 + /encoders.3/encoders.3.7/Add.weight, /encoders.3/encoders.3.7/conv4/Conv.weight, /encoders.3/encoders.3.7/conv5/Conv + encoders.3.7.gamma + /encoders.3/encoders.3.7/Mul_2 + /encoders.3/encoders.3.7/Add_1.weight, /encoders.3/encoders.3.7/sca/sca.1/Conv.weight, /encoders.3/encoders.3.8/conv1/Conv.weight, /encoders.3/encoders.3.8/conv3/Conv + encoders.3.8.beta + /encoders.3/encoders.3.8/Mul_1 + /encoders.3/encoders.3.8/Add.weight, /encoders.3/encoders.3.8/conv4/Conv.weight, /encoders.3/encoders.3.8/conv5/Conv + encoders.3.8.gamma + /encoders.3/encoders.3.8/Mul_2 + /encoders.3/encoders.3.8/Add_1.weight, /encoders.3/encoders.3.8/sca/sca.1/Conv.weight, /encoders.3/encoders.3.9/conv1/Conv.weight, /encoders.3/encoders.3.9/conv3/Conv + encoders.3.9.beta + /encoders.3/encoders.3.9/Mul_1 + /encoders.3/encoders.3.9/Add.bias, /encoders.3/encoders.3.9/conv3/Conv + encoders.3.9.beta + /encoders.3/encoders.3.9/Mul_1 + /encoders.3/encoders.3.9/Add.weight, /encoders.3/encoders.3.9/conv4/Conv.weight, /encoders.3/encoders.3.9/conv5/Conv + encoders.3.9.gamma + /encoders.3/encoders.3.9/Mul_2 + /encoders.3/encoders.3.9/Add_1.weight, /encoders.3/encoders.3.9/sca/sca.1/Conv.weight, /ending_/Conv.weight, /ending_up/ending_up.0/Conv.weight, /middle_blks/middle_blks.0/conv1/Conv.weight, /middle_blks/middle_blks.0/conv2/Conv.weight, /middle_blks/middle_blks.0/conv3/Conv + middle_blks.0.beta + /middle_blks/middle_blks.0/Mul_1 + /middle_blks/middle_blks.0/Add.bias, /middle_blks/middle_blks.0/conv3/Conv + middle_blks.0.beta + /middle_blks/middle_blks.0/Mul_1 + /middle_blks/middle_blks.0/Add.weight, /middle_blks/middle_blks.0/conv4/Conv.weight, /middle_blks/middle_blks.0/conv5/Conv + middle_blks.0.gamma + /middle_blks/middle_blks.0/Mul_2 + /middle_blks/middle_blks.0/Add_1.bias, /middle_blks/middle_blks.0/conv5/Conv + middle_blks.0.gamma + /middle_blks/middle_blks.0/Mul_2 + /middle_blks/middle_blks.0/Add_1.weight, /middle_blks/middle_blks.0/sca/sca.1/Conv.bias, /middle_blks/middle_blks.0/sca/sca.1/Conv.weight, /ups.0/ups.0.0/Conv.weight, /ups.1/ups.1.0/Conv.weight, /ups.2/ups.2.0/Conv.weight, /ups.3/ups.3.0/Conv.weight, encoders.3.5.norm1.weight + /encoders.3/encoders.3.5/norm1/Mul + encoders.3.5.norm1.bias + /encoders.3/encoders.3.5/norm1/Add_1.shift, encoders.3.8.norm1.weight + /encoders.3/encoders.3.8/norm1/Mul + encoders.3.8.norm1.bias + /encoders.3/encoders.3.8/norm1/Add_1.shift, middle_blks.0.norm1.weight + /middle_blks/middle_blks.0/norm1/Mul + middle_blks.0.norm1.bias + /middle_blks/middle_blks.0/norm1/Add_1.scale, middle_blks.0.norm1.weight + /middle_blks/middle_blks.0/norm1/Mul + middle_blks.0.norm1.bias + /middle_blks/middle_blks.0/norm1/Add_1.shift, middle_blks.0.norm2.weight + /middle_blks/middle_blks.0/norm2/Mul + middle_blks.0.norm2.bias + /middle_blks/middle_blks.0/norm2/Add_1.shift
[08/20/2024-20:57:50] [W] [TRT] - 4 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
You can check your conv is adjacent to bn op afterwards or not.
I also encountered this problem, that I had set the layer_precision and layer_output_type to kFLOAT of all layers can be set under FP16 mode, but some inference results are still wrong(the results are all one).
Finally, I found that it may result from the reformation of input. Under FP16 mode, the input are firstly reformatted into FP16, although the following layer works in kFLOAT precision and the output type is kFLOAT.
Here is part of engine graph of our models
I wonder is there any way to disable inserting reformation( to FP16) layer under FP16 mode? Thanks! @lix19937
If you can make your input data type are/is fp16 (preprocess phase, the img data from fp32 to fp16),
then use trtexec --inputIOFormats=fp16:chw,fp16:chw,fp16:chw,fp16:chw --outputIOFormats=fp16:chw in postprocess phase, fp16 out
@lix19937 Thanks for your reply. Finally, I found that the problem is due to the overflow in FP16 of some inputs. After we fixed the overflow, the accuracy of result is within expectations.
Closing this issue as it seems like it is resolved. If not, feel free to re-open.