neural-compressor
neural-compressor copied to clipboard
Per tensor quantization in smoothquant
Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the following table from the SmoothQuant paper I see weights can be also quantized per tensor (smoothquant-O3). Is it possible to apply smoothquant by setting the QuantizeConfig or other stuff? I really aim a per_tensor quantization on both activations and weights due to a limitation of my hardware. Thanks!
Hi Chen, thanks for your response. Currently weights could only be quantized per-channel in INC SmoothQuant. Please refer to SmoothQuant_doc for more details of our implementation. Thanks!
I see, thanks for your reply!