optimum
optimum copied to clipboard
Does optimum able to quantize GPT2 model?
Does optimum able to quantize GPT2 model?
Hi @jinfagang , Yes optimum allows you to apply both dynamic quand static quantization on a GPT2 model. We however currently support a subset of tasks such as text classification, token classification and question answering. We plan to add many more in the future (including tasks more relevant to decoder-only and encoder-decoder architectures).
@echarlaix thanks for your reply. What I am more interested is that if you have any experience on quantize a huge GPT2 model like 7.6GB in onnx size?
I managed quantize Bert model using onnxruntime built-in feature, but when apply on GPT2 large model, it fails.
If optimum can have an example on large model (specifically for model in 125M params more, model size large than 7GB), small models won't have problems, but large model, it actually where troubles come. Such as onnx doesn't support unified model large than 2GB etc.
So if there any tutorials on quantize very huge model, it would be very useful.