TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Subnormal FP16 value detected

Open deephog opened this issue 2 years ago • 6 comments

When using TRT8.4 to generate a model with FP16 precision, following warning occurred:

[07/27/2022-23:16:56] [W] [TRT] Weights [name=Conv_13703 + Add_13709 + onnx::Mul_4732_clone_3 + (Unnamed Layer* 7047) [Shuffle] + Mul_13729.weight] had the following issues when converted to FP16: [07/27/2022-23:16:56] [W] [TRT] - Subnormal FP16 values detected. [07/27/2022-23:16:56] [W] [TRT] - Values less than smallest positive FP16 Subnormal value detected. Converting to FP16 minimum subnormalized value. [07/27/2022-23:16:56] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to reduce the magnitude of the weights.

It turns out the output of the FP16 model is very different from FP32 ones.

I tried TRT 8.2.5, which doesn't give any of these warnings, but still, the results were wrong.

I am aware that we can keep certain layers FP32 to prevent this issue, but this warning is happening to most of my layers, which means I cannot benefit from speed of FP16 at all.

I also tried training the model with mixed precision (which I later realized wouldn't help since the weights are still saved in FP32), and clamp my weights after each training iteration (5.96e−8 to 65504). It still failed with the same results.

So is there any good practice during training or TRT generation that can help with the situation?

deephog avatar Jul 28 '22 00:07 deephog

I did a few more tests.

It turns out that even if I clamp the weights of the original model to 6.1e-5, the normal value range of FP16, if I run the model with FP32 precision, the outputs are still good. So it is highly possibly the error to be introduced by the weight type casting.

So, is there any possible way to do FP16 calibration?

deephog avatar Jul 28 '22 07:07 deephog

I'm trying to use the precision debug tool provided by Polygraphy. There is no direct example of how to use it and there are a ton of arguments.

polygraphy debug precision test.onnx --precision-constraints="none" --mode=linear --workspace=8590000000 --log-file=precision_debug.log --tf32 --fp16

After the engine is compiled, it asks me:" did it pass or fail? ". I understand that I can load the debug.engine and check the results, is there any way to automatically check the results with a "golden.json" like the other examples do?

deephog avatar Jul 29 '22 02:07 deephog

@pranavm-nvidia ^ ^

zerollzeng avatar Jul 30 '22 02:07 zerollzeng

@deephog You can automate it using the --check option, similar to debug reduce. For example:

  1. Dump out golden outputs with:
polygraphy run test.onnx --onnxrt --save-outputs golden.json
  1. Use --check:
polygraphy debug precision test.onnx --mode=linear \
    --workspace=8.5G --log-file=precision_debug.log --tf32 --fp16 \
    --check \
        polygraphy run polygraphy_debug.engine --trt --load-outputs golden.json

Also see polygraphy debug -h for details on how the debug tools work.

It would also be good to double check whether this is a limitation of the model itself or a bug in TensorRT. As in this example, you can convert the model to FP16 and then validate it in ONNX-Runtime:

polygraphy convert test.onnx --fp-to-fp16 -o test_fp16.onnx 
# Check for NaNs/Infs
polygraphy run test_fp16.onnx --onnxrt --validate
# Check accuracy against previously dumped golden outputs
polygraphy run test_fp16.onnx --onnxrt --load-outputs golden.json

pranavm-nvidia avatar Aug 01 '22 13:08 pranavm-nvidia

@deephog You can automate it using the --check option, similar to debug reduce. For example:

1. Dump out golden outputs with:
polygraphy run test.onnx --onnxrt --save-outputs golden.json
2. Use `--check`:
polygraphy debug precision test.onnx --mode=linear \
    --workspace=8.5G --log-file=precision_debug.log --tf32 --fp16 \
    --check \
        polygraphy run polygraphy_debug.engine --trt --load-outputs golden.json

Also see polygraphy debug -h for details on how the debug tools work.

It would also be good to double check whether this is a limitation of the model itself or a bug in TensorRT. As in this example, you can convert the model to FP16 and then validate it in ONNX-Runtime:

polygraphy convert test.onnx --fp-to-fp16 -o test_fp16.onnx 
# Check for NaNs/Infs
polygraphy run test_fp16.onnx --onnxrt --validate
# Check accuracy against previously dumped golden outputs
polygraphy run test_fp16.onnx --onnxrt --load-outputs golden.json

I see! Thanks a lot

deephog avatar Aug 01 '22 16:08 deephog

I think it is a combination of onnx-tensorrt and TensorRT versions. In my environment, using a combination of TensorRT and onnx-tensorrt in a version newer than this will always degrade the accuracy of FP16. By the way, if I convert a network that has only a simple Convolution, it will degrade in versions newer than TensorRT 8.4.0. So I downgraded from CUDA11.7 to CUDA11.6, downgraded TensorRT from 8.4.3 to 8.4.0 and also changed onnx-tensorrt back to 9f82b2b6072be6c01f65306388e5c07621d3308f.

I am not sure if this information is useful to you.

TensorRT 8.4.1-8.4.3 + onnx-tensorrt 8.4-GA -> accuracy degrade

$ dpkg -l | grep TensorRT

ii  graphsurgeon-tf        8.4.0-1+cuda11.6   amd64 GraphSurgeon for TensorRT package
ii  libnvinfer-bin         8.4.0-1+cuda11.6   amd64 TensorRT binaries
ii  libnvinfer-dev         8.4.0-1+cuda11.6   amd64 TensorRT development libraries and headers
ii  libnvinfer-doc         8.4.0-1+cuda11.6   all   TensorRT documentation
ii  libnvinfer-plugin-dev  8.4.0-1+cuda11.6   amd64 TensorRT plugin libraries
ii  libnvinfer-plugin8     8.4.0-1+cuda11.6   amd64 TensorRT plugin libraries
ii  libnvinfer-samples     8.4.0-1+cuda11.6   all   TensorRT samples
ii  libnvinfer8            8.4.0-1+cuda11.6   amd64 TensorRT runtime libraries
ii  libnvonnxparsers-dev   8.4.0-1+cuda11.6   amd64 TensorRT ONNX libraries
ii  libnvonnxparsers8      8.4.0-1+cuda11.6   amd64 TensorRT ONNX libraries
ii  libnvparsers-dev       8.4.0-1+cuda11.6   amd64 TensorRT parsers libraries
ii  libnvparsers8          8.4.0-1+cuda11.6   amd64 TensorRT parsers libraries
ii  onnx-graphsurgeon      8.4.0-1+cuda11.6   amd64 ONNX GraphSurgeon for TensorRT package
ii  python3-libnvinfer     8.4.0-1+cuda11.6   amd64 Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev 8.4.0-1+cuda11.6   amd64 Python 3 development package for TensorRT
ii  tensorrt               8.4.0.6-1+cuda11.6 amd64 Meta package of TensorRT
ii  uff-converter-tf       8.4.0-1+cuda11.6   amd64 UFF converter for TensorRT package

$ git clone --recursive https://github.com/onnx/onnx-tensorrt \
    && cd onnx-tensorrt \
    && git checkout 9f82b2b6072be6c01f65306388e5c07621d3308f \
    && mkdir build \
    && cd build \
    && cmake .. -DTENSORRT_ROOT=/usr/src/tensorrt \
    && make -j$(nproc) \
    && make install

Ref: https://forums.developer.nvidia.com/t/subnormal-fp16-values-detected/220070

PINTO0309 avatar Sep 08 '22 01:09 PINTO0309

I also encounter the same problem. And I set the warning weight layer to FP32 manually, then the warning disappear. However, the accuracy of the result is still degraded.

YouSenRong avatar Oct 26 '22 01:10 YouSenRong

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

zerollzeng avatar Oct 26 '22 11:10 zerollzeng

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]_output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

It seems that the output type of "Unnamed Layer_ 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670". image

YouSenRong avatar Oct 26 '22 14:10 YouSenRong

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]_output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

It seems that the output type of "Unnamed Layer_ 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670". image

Did you do quantization to the model? I only see this kind of mismatch when I do quantization

deephog avatar Oct 26 '22 17:10 deephog

@YouSenRong Setting those layer back to FP32 just solve the subnormal value issue, but FP16 indeed has less accuracy than FP32 due to less mantissa bit. if you set all layers to FP32, can you observe the accuracy drop as well? if yes then it might be an accuracy bug.

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05. And I meet another problem, as "[W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: Mismatched type for tensor (Unnamed Layer_ 670) [Shuffle]output', f32 vs. expected type:f16." "[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)" It seems that the output type of "Unnamed Layer 670" is unexpected. The error disappear if I didn't set the layer precision of "phase0_tf/task_9_fc_0/bias:0", which is the previous layer of "Unnamed Layer_ 670". image

Did you do quantization to the model? I only see this kind of mismatch when I do quantization

Quantization means for INT8 or FP16? I just want to use FP16 for mixed precision, I am not quantizing model with INT8. And I can get a closed result in FP16 by setting the layers to FP32, although there are still some problems where I can set some layers to FP32.

I found that these problem layers is a constant layer (kCONSTANT type) following with a shuffle layer (kSHUFFLE type) and TensorRT try to apply ConstantShuffleFusion on them. I guess it is the fusion mechanism lead to the problem for me.

YouSenRong avatar Oct 27 '22 01:10 YouSenRong

@YouSenRong Can you give you commd how to set the layers to fp32

zll0000 avatar Jan 13 '23 10:01 zll0000

There is still a little different, the absolute differences between FP32 and FP16( |FP32 - FP16|) are as 2.08711E-08、2.73753E-05、4.54187E-05.

it's expected, the diff is very small.

"[E] [TRT] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[(Unnamed Layer* 1) [Constant] + (Unnamed Layer* 2) [Shuffle]...phase0_tf/predict_node]}.)"

Could you try the latest TRT version? we had a lot of enhancements for those cases.

zerollzeng avatar Jan 13 '23 14:01 zerollzeng

I use tensorrt version is 8.5.2.2

zll0000 avatar Jan 13 '23 15:01 zll0000

@zerollzeng

zll0000 avatar Jan 14 '23 00:01 zll0000

@zll0000 You met the same error? Can you share your onnx here so that I can take a check? thanks!

zerollzeng avatar Jan 16 '23 12:01 zerollzeng

closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

ttyio avatar Feb 14 '23 01:02 ttyio

@YouSenRong Can you give you commd how to set the layers to fp32

The trtexec support command line: --layerPrecisions, --layerOutputTypesA.2.1.4. Commonly Used Command-line Flags to set layer precision. You can refer to it. @zll0000

YouSenRong avatar Feb 22 '23 12:02 YouSenRong

We have encountered the same problem, the output accuracy is very different after exporting from the QDQ onnx model to the trt engine model.
And I am a super-resolution model, the result can be aligned with trt+fp16 during QDQ+onnxruntime , but QDQ onnx is exported to trt (use the ppq export tool: https://github.com/openppl-public/ppq/blob/master/ppq/utils/TensorRTUtil.py#L212), the result is abnormal

zt706 avatar Jun 06 '23 08:06 zt706

hi @zt706 did u solve it? i encountered the same problem for super-resolution model.

zimenglan-sysu-512 avatar Jul 17 '23 02:07 zimenglan-sysu-512