TensorRT
TensorRT copied to clipboard
Do I have to do PTQ before QAT with pytorch_quantization toolkit?
In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ.
I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive".
So what should be the proper procedure if you want to do only QAT?
In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ.
I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive".
So what should be the proper procedure if you want to do only QAT?
Has the problem been solved?
In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?
Has the problem been solved?
No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.
@ttyio ^ ^
Hi @deephog , we recommend to do PTQ first, then doing the QAT to fine tune the weights using the fixed quant scale. This helps converge.
In theory you can also do PTQ and QAT in single pass by enable_calib
and enable_quant
in single pass, we have no example but you could take a try.
We do not support QAT without PTQ. Thanks!
Hi @deephog , we recommend to do PTQ first, then doing the QAT to fine tune the weights using the fixed quant scale. This helps converge.
In theory you can also do PTQ and QAT in single pass by
enable_calib
andenable_quant
in single pass, we have no example but you could take a try.We do not support QAT without PTQ. Thanks!
Thank you for the info!
In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?
Has the problem been solved?
No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.
how did you solved the problem that makes the PTQ very slow and buggy
In the only example provided in the toolkit, it loaded the PTQ calibrated weights and did the QAT based on it. There isn't a standalone QAT example without PTQ. I tried to do QAT without PTQ (just quant_modules.initialize() to swap the layers and start training), it runs well in python and I got pretty good results. But when I try to export the QATed model to TRT, it gives me error saying "the scale of the quantization layer must be positive". So what should be the proper procedure if you want to do only QAT?
Has the problem been solved?
No, but because I solved the problem that makes the PTQ very slow and buggy, now PTQ first then QAT is no longer an issue. I'm still interested in doing QAT alone though.
how did you solved the problem that makes the PTQ very slow and buggy
Instead of doing the automatic substitution of all compatible layers, I tried to manually replace the modules that I feel safe. It turns out to be one of my custom layers being the problem, which eventually gave me all the NaN results. After excluding it from quantization, everything is back to normal. Actually, it doesn't make the PTQ process faster than it should be, it is still slow, I eventually just give it a thorough and long PTQ with a lot of samples and save it for future use and never PTQ it again unless necessary.
closing old issues that is inactive for long time, thanks all!