Has anyone been converted the model to INT8 successfully?
Hi all,
I couldn't convert the model from .pth -> ONNX -> TRT INT8. Has anyone met the same situation?
It's successful to convert the model to FP16 or FP8. Yet, once I set INT8 in the trtexec, what I got in the end would be a "FAILED TensorRT.....".
Possible solutions I tried are:
- Use simplified ONNX model
- Set ONNX model in opset17 instead opset16
- Set allocationStrategy=runtime in trtexec Neither of them solved my case.
My environment:
- TensorRT: 10.5, 10.6.0.26 and 10.8.0.43
- CUDA: 11.8
- DFINE model type: dfine_hgnetv2_n_coco.yml
Many thanks!
Yes I encountered the same problem - pretty much the same steps as described in the original post.
I believe this probably is an issue with tensort 10.8 - int8 quantization with D-FINE was working with tensorrt 8.6.
This model can also not be quantized with tensorrt 10.3.
-
Maybe with tensorrt10 you should try the new TensorRT-Model-Optimizer: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/onnx_ptq/README.md
-
Also for QAT: https://github.com/NVIDIA/TensorRT/tree/release/10.10/tools/pytorch-quantization " All developers are encouraged to use the TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression."