TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Improving int8 quantization results.

Open severecoder opened this issue 1 year ago • 3 comments

I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

The end result is to have .trt or engine file inferencing at int8 precision with best possible detection metrics.

TIA

severecoder avatar May 15 '24 01:05 severecoder

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

pytorch_quantizaton will be deprecated, please use AMMO now.

zerollzeng avatar May 17 '24 12:05 zerollzeng

Thank for the response, isn't ammo only limited to LLMs?

severecoder avatar May 20 '24 22:05 severecoder

There's also support for diffuser models. [link]

Btw, AMMO has been renamed to TensorRT Model Optimizer. [reference]

brb-nv avatar May 28 '24 05:05 brb-nv