Tang Kaihui

Results 10 comments of Tang Kaihui

Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model: ``` from neural_compressor.config import PostTrainingQuantConfig from neural_compressor import quantization config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto') q_model...

Hi @yingmuying , Thanks for your reply. The `PostTrainingQuantConfig` is used to configure quantization parameters, you can refer to [config-docstring](https://github.com/intel/neural-compressor/blob/e22c61ede2942f7f1ba1cf9e480491371184bb32/neural_compressor/config.py#L1195C1-L1291C8) to understand the meaning of parameters. There are some other...

> Can we 1) create an `InputCaptureModule` during the prepare stage and 2) initialize an original `AutoRound` at the convert stage, receiving a) the original model and b) the data...

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear For cpu, how does the woq algorithm use abstract class `WeightOnlyLinear `? Do we use `INCweightonlinear` instead of `WeightOnlyLinear`?

marked draft and will migrate to https://github.com/intel/neural-compressor/pull/1883

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes...

I suggest you try using 3.x api, q_model is the export compressed model. We will soon update the example of 3. x, which supports detection of auto-device. https://github.com/intel/neural-compressor/tree/kaihui/woq_3x_eg But we...

> as said in comment Do we need to upgrade transformers in requirement? https://github.com/intel/auto-round/blob/3c1a678152579bac7ff51b5a6b64076bc792d728/requirements.txt#L12 Will it bring other problems?

> 1 one option is waiting for another 2 or 3 months to upgrade the trasnsformers 2 another is trying the way gpt suggested > > ```python > import functools...