chen
chen
Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the...
Hey, I think it would be very helpful if you could append a section about the accuracy drop/latency improvement after the quantization in the readme.
when loading the quantized model (smoothquant) with ``` from neural_compressor.utils.pytorch import load qmodel = load(qmodel_path, model_fp) ``` I got `RecursiveScriptModule(original_name=QuantizationDispatchModule)` I'd like to extract those quantized int8 weight matrix, together...