How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor
Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model:
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor import quantization
config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto')
q_model = quantization.fit(your_model, config)
If you want to use other quantization methods, please refer to examples.
Hi,Kaihui 首先非常感谢您的回复。刚开始学习使用 neural-compressor 进行量化,有很多可能比较低级的问题。参照 neural-compressor/examples/onnxrt/image_recognition/beit/quantization/ptq_static 也跑通了默认流程,但是只要想尝试一点其他参数就会报错。看 https://intel.github.io/neural-compressor/latest/docs/source/quantization.html 介绍,onnx 和 pytorch 支持 symmetry quantization和asymmetric quantization,默认 ptq_static 支持的是 static asymmetric quantization,不知道怎么设置才能支持 symmetry quantization,很多参数意义也不太清楚,希望您指点帮助。谢谢!此致 敬礼yingmuying发自我的荣耀手机-------- 原始邮件 --------发件人: Kaihui-intel @.>日期: 2024年2月21日周三 13:13收件人: intel/neural-compressor @.>抄送: yingmuying @.>, Mention @.>主 题: Re: [intel/neural-compressor] How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor (Issue #1612) Hi @yingmuying Thanks for raising this issue. You can use dynamic quantization for the model: from neural_compressor.config import PostTrainingQuantConfig from neural_compressor import quantization
config = PostTrainingQuantConfig(device='cpu', approach='dynamic', domain='auto') q_model = quantization.fit(your_model, config)
If you want to use other quantization methods, please refer to examples.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
Hi @yingmuying , Thanks for your reply.
The PostTrainingQuantConfig is used to configure quantization parameters, you can refer to config-docstring to understand the meaning of parameters. There are some other descriptions to help you understand.
About static asymmetric/asymmetric quantization, you can configure by setting scheme field in op_type_dict or op_name_dict.
e.g.
from neural_compressor.config import PostTrainingQuantConfig
op_type_dict = {
'Conv':{
"weight": {
"dtype": ["fp32"],
"scheme": ["sym"],
},
"activation": {
"dtype": ["fp32"]
}
}
}
config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)
or match all layers by ".*":
op_type_dict = {".*": {"weight": {"dtype": ["int8"], "scheme": "sym"}, "activation": {"dtype": ["fp32"]}}}
config = PostTrainingQuantConfig(device='cpu', approach='static', domain='auto', op_type_dict=op_type_dict)
more usage in specify-quantization-rules