TensorRT Subnormal FP16 value detected

When using TRT8.4 to generate a model with FP16 precision, following warning occurred:

[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13703 + Add_13709 + onnx::Mul_4732_clone_3 + (Unnamed Layer* 7047) [Shuffle] + Mul_13729.weight] had the following issues when converted to FP16: [07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected. [07/27/2022-23:16:56] [W] [TRT] - Values less than smallest positive FP16 Subnormal value detected. Converting to FP16 minimum subnormalized value. [07/27/2022-23:16:56] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.

It turns out the output of the FP16 model is very different from FP32 ones.

I tried TRT 8.2.5, which doesn't give any of these warnings, but still, the results were wrong.

I am aware that we can keep certain layers FP32 to prevent this issue, but this warning is happening to most of my layers, which means I cannot benefit from speed of FP16 at all.

I also tried training the model with mixed precision (which I later realized wouldn't help since the weights are still saved in FP32), and clamp my weights after each training iteration (5.96e−8 to 65504). It still failed with the same results.

So is there any good practice during training or TRT generation that can help with the situation?

Jul 28 '22 00:07 deephog

I did a few more tests.

It turns out that even if I clamp the weights of the original model to 6.1e-5, the normal value range of FP16, if I run the model with FP32 precision, the outputs are still good. So it is highly possibly the error to be introduced by the weight type casting.

So, is there any possible way to do FP16 calibration?

Jul 28 '22 07:07 deephog

I'm trying to use the precision debug tool provided by Polygraphy. There is no direct example of how to use it and there are a ton of arguments.

polygraphy debug precision test.onnx --precision-constraints="none" --mode=linear --workspace=8590000000 --log-file=precision_debug.log --tf32 --fp16

After the engine is compiled, it asks me:" did it pass or fail? ". I understand that I can load the debug.engine and check the results, is there any way to automatically check the results with a "golden.json" like the other examples do?

Jul 29 '22 02:07 deephog

@pranavm-nvidia ^ ^

Jul 30 '22 02:07 zerollzeng

@deephog You can automate it using the --check option, similar to debug reduce. For example:

Dump out golden outputs with:

polygraphy run test.onnx --onnxrt --save-outputs golden.json

Use --check:

polygraphy debug precision test.onnx --mode=linear \
    --workspace=8.5G --log-file=precision_debug.log --tf32 --fp16 \
    --check \
        polygraphy run polygraphy_debug.engine --trt --load-outputs golden.json

Also see polygraphy debug -h for details on how the debug tools work.

It would also be good to double check whether this is a limitation of the model itself or a bug in TensorRT. As in this example, you can convert the model to FP16 and then validate it in ONNX-Runtime:

polygraphy convert test.onnx --fp-to-fp16 -o test_fp16.onnx 
# Check for NaNs/Infs
polygraphy run test_fp16.onnx --onnxrt --validate
# Check accuracy against previously dumped golden outputs
polygraphy run test_fp16.onnx --onnxrt --load-outputs golden.json

Aug 01 '22 13:08 pranavm-nvidia

@deephog You can automate it using the --check option, similar to debug reduce. For example:
1. Dump out golden outputs with:
polygraphy run test.onnx --onnxrt --save-outputs golden.json
2. Use `--check`:
polygraphy debug precision test.onnx --mode=linear \
    --workspace=8.5G --log-file=precision_debug.log --tf32 --fp16 \
    --check \
        polygraphy run polygraphy_debug.engine --trt --load-outputs golden.json
Also see polygraphy debug -h for details on how the debug tools work.

It would also be good to double check whether this is a limitation of the model itself or a bug in TensorRT. As in this example, you can convert the model to FP16 and then validate it in ONNX-Runtime:
polygraphy convert test.onnx --fp-to-fp16 -o test_fp16.onnx 
# Check for NaNs/Infs
polygraphy run test_fp16.onnx --onnxrt --validate
# Check accuracy against previously dumped golden outputs
polygraphy run test_fp16.onnx --onnxrt --load-outputs golden.json

I see! Thanks a lot

Aug 01 '22 16:08 deephog

I think it is a combination of onnx-tensorrt and TensorRT versions. In my environment, using a combination of TensorRT and onnx-tensorrt in a version newer than this will always degrade the accuracy of FP16. By the way, if I convert a network that has only a simple Convolution, it will degrade in versions newer than TensorRT 8.4.0. So I downgraded from CUDA11.7 to CUDA11.6, downgraded TensorRT from 8.4.3 to 8.4.0 and also changed onnx-tensorrt back to 9f82b2b6072be6c01f65306388e5c07621d3308f.

I am not sure if this information is useful to you.

TensorRT 8.4.1-8.4.3 + onnx-tensorrt 8.4-GA -> accuracy degrade

$ dpkg -l | grep TensorRT

ii  graphsurgeon-tf        8.4.0-1+cuda11.6   amd64 GraphSurgeon for TensorRT package
ii  libnvinfer-bin         8.4.0-1+cuda11.6   amd64 TensorRT binaries
ii  libnvinfer-dev         8.4.0-1+cuda11.6   amd64 TensorRT development libraries and headers
ii  libnvinfer-doc         8.4.0-1+cuda11.6   all   TensorRT documentation
ii  libnvinfer-plugin-dev  8.4.0-1+cuda11.6   amd64 TensorRT plugin libraries
ii  libnvinfer-plugin8     8.4.0-1+cuda11.6   amd64 TensorRT plugin libraries
ii  libnvinfer-samples     8.4.0-1+cuda11.6   all   TensorRT samples
ii  libnvinfer8            8.4.0-1+cuda11.6   amd64 TensorRT runtime libraries
ii  libnvonnxparsers-dev   8.4.0-1+cuda11.6   amd64 TensorRT ONNX libraries
ii  libnvonnxparsers8      8.4.0-1+cuda11.6   amd64 TensorRT ONNX libraries
ii  libnvparsers-dev       8.4.0-1+cuda11.6   amd64 TensorRT parsers libraries
ii  libnvparsers8          8.4.0-1+cuda11.6   amd64 TensorRT parsers libraries
ii  onnx-graphsurgeon      8.4.0-1+cuda11.6   amd64 ONNX GraphSurgeon for TensorRT package
ii  python3-libnvinfer     8.4.0-1+cuda11.6   amd64 Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev 8.4.0-1+cuda11.6   amd64 Python 3 development package for TensorRT
ii  tensorrt               8.4.0.6-1+cuda11.6 amd64 Meta package of TensorRT
ii  uff-converter-tf       8.4.0-1+cuda11.6   amd64 UFF converter for TensorRT package

$ git clone --recursive https://github.com/onnx/onnx-tensorrt \
    && cd onnx-tensorrt \
    && git checkout 9f82b2b6072be6c01f65306388e5c07621d3308f \
    && mkdir build \
    && cd build \
    && cmake .. -DTENSORRT_ROOT=/usr/src/tensorrt \
    && make -j$(nproc) \
    && make install

Ref: https://forums.developer.nvidia.com/t/subnormal-fp16-values-detected/220070

Sep 08 '22 01:09 PINTO0309

I also encounter the same problem. And I set the warning weight layer to FP32 manually, then the warning disappear. However, the accuracy of the result is still degraded.

Oct 26 '22 01:10 YouSenRong

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

Oct 26 '22 11:10 zerollzeng

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]_output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

It seems that the output type of "Unnamed Layer_ 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670".

Oct 26 '22 14:10 YouSenRong

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]_output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

It seems that the output type of "Unnamed Layer_ 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670".

Did you do quantization to the model? I only see this kind of mismatch when I do quantization

Oct 26 '22 17:10 deephog

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05. And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)" It seems that the output type of "Unnamed Layer 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670".

Did you do quantization to the model? I only see this kind of mismatch when I do quantization

Quantization means for INT8 or FP16? I just want to use FP16 for mixed precision, I am not quantizing model with INT8. And I can get a closed result in FP16 by setting the layers to FP32, although there are still some problems where I can set some layers to FP32.

I found that these problem layers is a constant layer (kCONSTANT type) following with a shuffle layer (kSHUFFLE type) and TensorRT try to apply ConstantShuffleFusion on them. I guess it is the fusion mechanism lead to the problem for me.

Oct 27 '22 01:10 YouSenRong

@YouSenRong Can you give you commd how to set the layers to fp32

Jan 13 '23 10:01 zll0000

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

it's expected, the diff is very small.

"[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

Could you try the latest TRT version? we had a lot of enhancements for those cases.

Jan 13 '23 14:01 zerollzeng

I use tensorrt version is 8.5.2.2

Jan 13 '23 15:01 zll0000

@zerollzeng

Jan 14 '23 00:01 zll0000

@zll0000 You met the same error? Can you share your onnx here so that I can take a check? thanks！

Jan 16 '23 12:01 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

Feb 14 '23 01:02 ttyio

@YouSenRong Can you give you commd how to set the layers to fp32

The trtexec support command line: --layerPrecisions, --layerOutputTypesA.2.1.4. Commonly Used Command-line Flags to set layer precision. You can refer to it. @zll0000

Feb 22 '23 12:02 YouSenRong

We have encountered the same problem, the output accuracy is very different after exporting from the QDQ onnx model to the trt engine model.
And I am a super-resolution model, the result can be aligned with trt+fp16 during QDQ+onnxruntime , but QDQ onnx is exported to trt (use the ppq export tool: https://github.com/openppl-public/ppq/blob/master/ppq/utils/TensorRTUtil.py#L212), the result is abnormal

Jun 06 '23 08:06 zt706

hi @zt706 did u solve it? i encountered the same problem for super-resolution model.

Jul 17 '23 02:07 zimenglan-sysu-512

TensorRT TensorRT copied to clipboard

Subnormal FP16 value detected

TensorRT
TensorRT copied to clipboard