neural-compressor icon indicating copy to clipboard operation
neural-compressor copied to clipboard

How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor

Open yingmuying opened this issue 1 year ago • 3 comments

yingmuying avatar Feb 19 '24 02:02 yingmuying

Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model:

from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor import quantization

config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto')
q_model = quantization.fit(your_model, config)

If you want to use other quantization methods, please refer to examples.

Kaihui-intel avatar Feb 21 '24 05:02 Kaihui-intel

Hi,Kaihui       首先非常感谢您的回复。刚开始学习使用 neural-compressor 进行量化,有很多可能比较低级的问题。参照 neural-compressor/examples/onnxrt/image_recognition/beit/quantization/ptq_static 也跑通了默认流程,但是只要想尝试一点其他参数就会报错。看 https://intel.github.io/neural-compressor/latest/docs/source/quantization.html 介绍,onnx 和 pytorch 支持 symmetry quantization和asymmetric quantization,默认 ptq_static 支持的是 static asymmetric quantization,不知道怎么设置才能支持 symmetry quantization,很多参数意义也不太清楚,希望您指点帮助。谢谢!此致        敬礼yingmuying发自我的荣耀手机-------- 原始邮件 --------发件人: Kaihui-intel @.>日期: 2024年2月21日周三 13:13收件人: intel/neural-compressor @.>抄送: yingmuying @.>, Mention @.>主 题: Re: [intel/neural-compressor] How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor (Issue #1612) Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model: from neural_compressor.config import PostTrainingQuantConfig from neural_compressor import quantization

config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto') q_model = quantization.fit(your_model, config)

If you want to use other quantization methods, please refer to examples.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

yingmuying avatar Feb 22 '24 07:02 yingmuying

Hi @yingmuying , Thanks for your reply. The PostTrainingQuantConfig is used to configure quantization parameters, you can refer to config-docstring to understand the meaning of parameters. There are some other descriptions to help you understand.

About static asymmetric/asymmetric quantization, you can configure by setting scheme field in op_type_dict or op_name_dict. e.g.

    from neural_compressor.config import PostTrainingQuantConfig
    op_type_dict = {
        'Conv':{
            "weight": {
                "dtype": ["fp32"],
                "scheme": ["sym"],
            },
            "activation": {
                "dtype": ["fp32"]
            }
        }
    }
    config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)

or match all layers by ".*":

op_type_dict = {".*": {"weight": {"dtype": ["int8"], "scheme": "sym"}, "activation": {"dtype": ["fp32"]}}} 
config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)

more usage in specify-quantization-rules

Kaihui-intel avatar Feb 23 '24 06:02 Kaihui-intel