optimum icon indicating copy to clipboard operation
optimum copied to clipboard

Does optimum able to quantize GPT2 model?

Open lucasjinreal opened this issue 3 years ago • 2 comments

Does optimum able to quantize GPT2 model?

lucasjinreal avatar Mar 09 '22 13:03 lucasjinreal

Hi @jinfagang , Yes optimum allows you to apply both dynamic quand static quantization on a GPT2 model. We however currently support a subset of tasks such as text classification, token classification and question answering. We plan to add many more in the future (including tasks more relevant to decoder-only and encoder-decoder architectures).

echarlaix avatar Mar 31 '22 11:03 echarlaix

@echarlaix thanks for your reply. What I am more interested is that if you have any experience on quantize a huge GPT2 model like 7.6GB in onnx size?

I managed quantize Bert model using onnxruntime built-in feature, but when apply on GPT2 large model, it fails.

If optimum can have an example on large model (specifically for model in 125M params more, model size large than 7GB), small models won't have problems, but large model, it actually where troubles come. Such as onnx doesn't support unified model large than 2GB etc.

So if there any tutorials on quantize very huge model, it would be very useful.

lucasjinreal avatar Mar 31 '22 13:03 lucasjinreal