neural-compressor Per tensor quantization in smoothquant

Per tensor quantization in smoothquant

Open chensterliu opened this issue 1 year ago • 2 comments

Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the following table from the SmoothQuant paper I see weights can be also quantized per tensor (smoothquant-O3). Is it possible to apply smoothquant by setting the QuantizeConfig or other stuff? I really aim a per_tensor quantization on both activations and weights due to a limitation of my hardware. Thanks!

Mar 22 '24 17:03 chensterliu

Hi Chen, thanks for your response. Currently weights could only be quantized per-channel in INC SmoothQuant. Please refer to SmoothQuant_doc for more details of our implementation. Thanks!

Apr 29 '24 01:04 yintong-lu

I see, thanks for your reply!

Apr 29 '24 09:04 chensterliu

neural-compressor neural-compressor copied to clipboard

Per tensor quantization in smoothquant

neural-compressor
neural-compressor copied to clipboard