TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

[Help Needed] Convert ONNX into fp16 Engine

Open Haoming02 opened this issue 1 year ago • 3 comments

Description

I tried to convert a DAT model into TensorRT format at fp16 precision. But when I tried to perform inference on it, it only produces nan.

Environment

TensorRT Version: 10.0.1.6

NVIDIA GPU: RTX 3060

NVIDIA Driver Version: 551.61

CUDA Version: 12.3

CUDNN Version: 8.9.7.29

Operating System

Windows 11

Relevant Files

Model Link: 4x-Nomos8kDAT (.onnx format)

Steps To Reproduce

  1. Convert the model into fp16 engine
trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-fp16.trt --shapes=input:1x3x128x128 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --fp16
  1. Perform Inference

    • Code Used: https://github.com/Haoming02/TensorRT-Cpp/tree/bf16
    • Replace every __nv_bfloat16 with half; and cuda_bf16.h with cuda_fp16.h
  2. See only a pure black output

    • When adding debug log to the outputData, it simply prints nan

Misc

Interestingly, if I convert the model into bf16 precision with the following:

trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT-bf16.trt --shapes=input:1x3x128x128 --inputIOFormats=bf16:chw --outputIOFormats=bf16:chw --bf16

And use the above code to perform inference, the output works correctly. So only fp16 causes nan issues...

  • The model size for fp32 is ~120 MB, for fp16 is ~70 MB; for bf16 is ~100 MB
  • The inference speed is similar between fp32 and bf16; but almost twice as fast for fp16

Previously, I also tried using TensorRT 8.6 to convert the model. When specifying the fp16 flag, it would print out some warnings about inaccuracy. However, these warnings were not present when converting the model using TensorRT 10.0.

Haoming02 avatar Jun 04 '24 07:06 Haoming02

Forgot to mention, but full precision also works correctly

trtexec --onnx=4xNomos8kDAT.onnx --saveEngine=4xNomos8kDAT.trt --shapes=input:1x3x128x128 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw

Haoming02 avatar Jun 04 '24 07:06 Haoming02

Just tried another model: 2x-ModernSpanimationV1, and adding the --fp16 flag still works correctly.

So probably a certain operator within the DAT architecture is causing the nan?

Haoming02 avatar Jun 05 '24 03:06 Haoming02

fp16 overflow or underflow, check your trtexec build log by add verbose.

lix19937 avatar Jun 07 '24 01:06 lix19937

@Haoming02 Could you re-link the inference code? The link seems broken

LeoZDong avatar Feb 12 '25 01:02 LeoZDong

inference code

https://github.com/Haoming02/TensorRT-Cpp

I've since then tried other Upscaler models, and they worked fine. So it's most likely that DAT architecture does not like fp16 precision...

Haoming02 avatar Feb 14 '25 10:02 Haoming02

@Haoming02 Hmm yeah that could be possible. In general, especially for "modern" neural networks, we recommend using the strongly typed mode in combination with calibrated quantization or quantization-aware training, instead of letting the engine decide the precision (read more here). You may also find this accuracy debugging page useful.

LeoZDong avatar Feb 14 '25 22:02 LeoZDong

How to make quantization-aware training in pytorch? We had use AMP training but failed to convert the checkpoint to tensorrt fp16, during inference , Nan generated, value overflow.

JohnHerry avatar Jul 18 '25 01:07 JohnHerry