TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Do I have to do PTQ before QAT with pytorch_quantization toolkit?

Open deephog opened this issue 2 years ago • 7 comments

In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ.

I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive".

So what should be the proper procedure if you want to do only QAT?

deephog avatar Jul 12 '22 22:07 deephog

In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ.

I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive".

So what should be the proper procedure if you want to do only QAT?

Has the problem been solved?

pangr avatar Jul 15 '22 03:07 pangr

In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?

Has the problem been solved?

No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.

deephog avatar Jul 15 '22 03:07 deephog

@ttyio ^ ^

zerollzeng avatar Jul 15 '22 16:07 zerollzeng

Hi @deephog , we recommend to do PTQ first, then doing the QAT to fine tune the weights using the fixed quant scale. This helps converge.

In theory you can also do PTQ and QAT in single pass by enable_calib and enable_quant in single pass, we have no example but you could take a try.

We do not support QAT without PTQ. Thanks!

ttyio avatar Aug 02 '22 03:08 ttyio

Hi @deephog , we recommend to do PTQ first, then doing the QAT to fine tune the weights using the fixed quant scale. This helps converge.

In theory you can also do PTQ and QAT in single pass by enable_calib and enable_quant in single pass, we have no example but you could take a try.

We do not support QAT without PTQ. Thanks!

Thank you for the info!

deephog avatar Aug 02 '22 17:08 deephog

In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?

Has the problem been solved?

No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.

how did you solved the problem that makes the PTQ very slow and buggy

VictorGump avatar Aug 05 '22 02:08 VictorGump

In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?

Has the problem been solved?

No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.

how did you solved the problem that makes the PTQ very slow and buggy

Instead of doing the automatic substitution of all compatible layers, I tried to manually replace the modules that I feel safe. It turns out to be one of my custom layers being the problem, which eventually gave me all the NaN results. After excluding it from quantization, everything is back to normal. Actually, it doesn't make the PTQ process faster than it should be, it is still slow, I eventually just give it a thorough and long PTQ with a lot of samples and save it for future use and never PTQ it again unless necessary.

deephog avatar Aug 05 '22 23:08 deephog

closing old issues that is inactive for long time, thanks all!

ttyio avatar Nov 23 '23 00:11 ttyio