InternVL Internvl3 awq quantization model

The performance of Internvl3 is excellent. May I ask when a quantized model, such as AWQ, will be released for deployment? Thx!

Apr 17 '25 00:04 crossxxd

Thank you for your interest in our work. We release the quantization model here.

Apr 18 '25 06:04 Weiyun1025

Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:

model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()

Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?

Apr 28 '25 15:04 nzarif

Keep an eye.

Jun 05 '25 09:06 Eliza-and-black

Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()
Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?

Hello, did you manage to load the AWQ model of InternVL3 ? Thank you in advance

Jul 07 '25 14:07 SamiK1909