Improving int8 quantization results.

Open severecoder opened this issue 1 year ago • 3 comments

I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

The end result is to have .trt or engine file inferencing at int8 precision with best possible detection metrics.

TIA

May 15 '24 01:05 severecoder

I am moving to quantization aware training to improve the accuracy, to improve the quantized int8 model, is pytorch_quantization the best tool for that?

pytorch_quantizaton will be deprecated, please use AMMO now.

May 17 '24 12:05 zerollzeng

Thank for the response, isn't ammo only limited to LLMs?

May 20 '24 22:05 severecoder

There's also support for diffuser models. [link]

Btw, AMMO has been renamed to TensorRT Model Optimizer. [reference]

May 28 '24 05:05 brb-nv