ppq icon indicating copy to clipboard operation
ppq copied to clipboard

How does PPQ perform real quantization and achieve speed up?

Open YixuanSeanZhou opened this issue 1 year ago • 0 comments
trafficstars

Question

Looking at the forward call of QConv2D, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.

I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?

Thanks in advance.

YixuanSeanZhou avatar Aug 08 '24 02:08 YixuanSeanZhou