[Help Needed] Convert ONNX into fp16 Engine
Description
I tried to convert a DAT model into TensorRT format at fp16 precision. But when I tried to perform inference on it, it only produces nan.
Environment
TensorRT Version: 10.0.1.6
NVIDIA GPU: RTX 3060
NVIDIA Driver Version: 551.61
CUDA Version: 12.3
CUDNN Version: 8.9.7.29
Operating System
Windows 11
Relevant Files
Model Link: 4x-Nomos8kDAT (.onnx format)
Steps To Reproduce
- Convert the model into
fp16engine
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-fp16.trt --shapes=input:1x3x128x128 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
-
Perform Inference
- Code Used: https://github.com/Haoming02/TensorRT-Cpp/tree/bf16
- Replace every
__nv_bfloat16withhalf; andcuda_bf16.hwithcuda_fp16.h
-
See only a pure black output
- When adding debug log to the
outputData, it simply printsnan
- When adding debug log to the
Misc
Interestingly, if I convert the model into bf16 precision with the following:
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-bf16.trt --shapes=input:1x3x128x128 --inputIOFormats=bf16:chw --outputIOFormats=bf16:chw --bf16
And use the above code to perform inference, the output works correctly. So only fp16 causes nan issues...
- The model size for
fp32is ~120 MB, forfp16is ~70 MB; forbf16is ~100 MB - The inference speed is similar between
fp32andbf16; but almost twice as fast forfp16
Previously, I also tried using TensorRT 8.6 to convert the model. When specifying the fp16 flag, it would print out some warnings about inaccuracy. However, these warnings were not present when converting the model using TensorRT 10.0.
Forgot to mention, but full precision also works correctly
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT.trt --shapes=input:1x3x128x128 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
Just tried another model: 2x-ModernSpanimationV1, and adding the --fp16 flag still works correctly.
So probably a certain operator within the DAT architecture is causing the nan?
fp16 overflow or underflow, check your trtexec build log by add verbose.
@Haoming02 Could you re-link the inference code? The link seems broken
inference code
https://github.com/Haoming02/TensorRT-Cpp
I've since then tried other Upscaler models, and they worked fine. So it's most likely that DAT architecture does not like fp16 precision...
@Haoming02 Hmm yeah that could be possible. In general, especially for "modern" neural networks, we recommend using the strongly typed mode in combination with calibrated quantization or quantization-aware training, instead of letting the engine decide the precision (read more here). You may also find this accuracy debugging page useful.
How to make quantization-aware training in pytorch? We had use AMP training but failed to convert the checkpoint to tensorrt fp16, during inference , Nan generated, value overflow.