Tang Kaihui comments

Results 10 comments of


                                            Tang Kaihui

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model: ``` from neural_compressor.config import PostTrainingQuantConfig from neural_compressor import quantization config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto') q_model...

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

Hi @yingmuying , Thanks for your reply. The `PostTrainingQuantConfig` is used to configure quantization parameters, you can refer to [config-docstring](https://github.com/intel/neural-compressor/blob/e22c61ede2942f7f1ba1cf9e480491371184bb32/neural_compressor/config.py#L1195C1-L1291C8) to understand the meaning of parameters. There are some other...

Support `auto_round` integration 3.x

> Can we 1) create an `InputCaptureModule` during the prepare stage and 2) initialize an original `AutoRound` at the convert stage, receiving a) the original model and b) the data...

Enhance 3.x torch WOQ load

Abstract WeightOnlyLinear class. Inherited class INCWeightOnlyLinear and HPUWeighOnlyLinear For cpu, how does the woq algorithm use abstract class `WeightOnlyLinear `? Do we use `INCweightonlinear` instead of `WeightOnlyLinear`?

add some new features for layer-wise quant

marked draft and will migrate to https://github.com/intel/neural-compressor/pull/1883

how to evaluate AWQ ?

Hello, @chunniunai220ml Thanks for your interest in Intel(R) Neural Compressor. https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples This document describes the 2. x API. 2.x example link is https://github.com/intel/neural-compressor/tree/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm

how to evaluate AWQ ?

sure, the q_model need to export a compressed model https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#export-compressed-model you can refer to https://github.com/intel/intel-extension-for-transformers/tree/v1.5/examples/huggingface/pytorch/text-generation/quantization v1.5 to quantize int4 model, it has integrated this export compressed model. It also includes...

Tang Kaihui

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

Support `auto_round` integration 3.x

Enhance 3.x torch WOQ load

add some new features for layer-wise quant

how to evaluate AWQ ?

how to evaluate AWQ ?

how to evaluate AWQ ?

Deprecate `torch_dtype`

Deprecate `torch_dtype`