chen issues

Repositories
Issues
Comments

Results 3 issues of


                                            chen

Per tensor quantization in smoothquant

Hello community, I've tried the smoothquant flow on an OPT-125m model with the default setting. Unsurprisely the activations are quantized per tensor and weighs are per channel. According to the...

add performance intro in readme

Hey, I think it would be very helpful if you could append a section about the accuracy drop/latency improvement after the quantization in the readme.

how to extract int8 weights from quantized model

when loading the quantized model (smoothquant) with ``` from neural_compressor.utils.pytorch import load qmodel = load(qmodel_path, model_fp) ``` I got `RecursiveScriptModule(original_name=QuantizationDispatchModule)` I'd like to extract those quantized int8 weight matrix, together...

aitce