TensorRT
TensorRT copied to clipboard
fp16 engine works fine with trtexec but encountering nan with python API
Description
I try to create a tensorrt engine from an onnx model
trtexec --onnx=model.onnx --saveEngine=engine.trt --fp16
When i use trtexec for inference it works fine
trtexec --loadEngine=engine.trt --fp16 --exportOutput=f
p16e.json
I can see that the output is just fine fp16e.json.
But when I try to run this engine file with python API the output is nan.
Environment
TensorRT Version: 8.6
NVIDIA GPU: RTX 3070
NVIDIA Driver Version: 535.154.05
CUDA Version: 12.2
CUDNN Version: 8.9
Operating System:
Python Version (if applicable): 3.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Here is the python script I am using tensorrt_test_fps.py.txt
Have you tried the latest release?: yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Yes it runs fine when using CUDAExecutionProvider with ONNXRuntime but with TensorrtExecutionProvider I get the same issue.
I am not sure if the values are overflowing.
- How to check if any activation is overflowing.
- Why trtexec works fine and only python API returns nan values ?
One of the reasons for nan i found what that with random outputs the model does not output nan, But with zeros it does so. what does trtexec use as input for the model ?
Could you please try validate the output with polygraphy run model.onnx --trt --fp16 --onnxrt.
See https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy
The output of this command is here
[E] FAILED | Output: '37293' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Mismatched outputs: ['output', '37293']
[E] Accuracy Summary | trt-runner-N0-02/13/24-17:35:13 vs. onnxrt-runner-N0-02/13/24-17:35:13 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 461.335s |
On increasing tolerence it works fine
[I] PASSED | Output: '37293' | Difference is within tolerance (rel=0.1, abs=0.1)
[I] PASSED | All outputs matched | Outputs: ['output', '37293']
[I] Accuracy Summary | trt-runner-N0-02/13/24-17:55:15 vs. onnxrt-runner-N0-02/13/24-17:55:15 | Passed: 1/1 iterations | Pass Rate: 100.0%
happened to me because of the model weights, some were smaller than fp16 min value. those weigths are clamped to zeros, then because of some zeros divisions, nans appear. To confirm whether the problem stems from your weights, attempt an export with --fp32. If so, retrain your model with fp16 precision before exporting to onnx.
I previously had little knowledge about debugging the internals of a TensorRT engine but now I am able to debug using additional bindings to find the source of overflow/underflow. My original issue was to know how to debug.
Meanwhile I found that if we use --best flag instead of --fp16 the it works fine, I can see meaningful output from the model. It would autmatically find layers that need to stay FP32. --fp16 flag seems to be more restrictive.
But I agree with you @Data-Iab that to get a good model we need to retrain the model.