How does PPQ perform real quantization and achieve speed up?

Open YixuanSeanZhou opened this issue 1 year ago • 0 comments

trafficstars

Question

Looking at the forward call of QConv2D, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.

I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?

Thanks in advance.

Aug 08 '24 02:08 YixuanSeanZhou

ppq ppq copied to clipboard

How does PPQ perform real quantization and achieve speed up?

Question

ppq
ppq copied to clipboard