D-FINE icon indicating copy to clipboard operation
D-FINE copied to clipboard

Has anyone been converted the model to INT8 successfully?

Open horngjason opened this issue 9 months ago • 3 comments

Hi all,

I couldn't convert the model from .pth -> ONNX -> TRT INT8. Has anyone met the same situation?

It's successful to convert the model to FP16 or FP8. Yet, once I set INT8 in the trtexec, what I got in the end would be a "FAILED TensorRT.....".

Image

Possible solutions I tried are:

  • Use simplified ONNX model
  • Set ONNX model in opset17 instead opset16
  • Set allocationStrategy=runtime in trtexec Neither of them solved my case.

My environment:

  • TensorRT: 10.5, 10.6.0.26 and 10.8.0.43
  • CUDA: 11.8
  • DFINE model type: dfine_hgnetv2_n_coco.yml

Many thanks!

horngjason avatar Mar 05 '25 01:03 horngjason

Yes I encountered the same problem - pretty much the same steps as described in the original post.

I believe this probably is an issue with tensort 10.8 - int8 quantization with D-FINE was working with tensorrt 8.6.

fdarvas avatar Apr 22 '25 23:04 fdarvas

This model can also not be quantized with tensorrt 10.3.

fdarvas avatar Apr 29 '25 21:04 fdarvas

  • Maybe with tensorrt10 you should try the new TensorRT-Model-Optimizer: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/examples/onnx_ptq/README.md

  • Also for QAT: https://github.com/NVIDIA/TensorRT/tree/release/10.10/tools/pytorch-quantization " All developers are encouraged to use the TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression."

olibartfast avatar May 22 '25 08:05 olibartfast