neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

Per tensor quantization in smoothquant

Open chensterliu opened this issue 1 year ago • 2 comments

Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the following table from the SmoothQuant paper I see weights can be also quantized per tensor (smoothquant-O3). Is it possible to apply smoothquant by setting the QuantizeConfig or other stuff? I really aim a per_tensor quantization on both activations and weights due to a limitation of my hardware. Thanks!

t3

chensterliu avatar Mar 22 '24 17:03 chensterliu

Hi Chen, thanks for your response. Currently weights could only be quantized per-channel in INC SmoothQuant. Please refer to SmoothQuant_doc for more details of our implementation. Thanks!

yintong-lu avatar Apr 29 '24 01:04 yintong-lu

I see, thanks for your reply!

chensterliu avatar Apr 29 '24 09:04 chensterliu