InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

Internvl3 awq quantization model

Open crossxxd opened this issue 8 months ago • 2 comments

The performance of Internvl3 is excellent. May I ask when a quantized model, such as AWQ, will be released for deployment? Thx!

crossxxd avatar Apr 17 '25 00:04 crossxxd

Thank you for your interest in our work. We release the quantization model here.

Weiyun1025 avatar Apr 18 '25 06:04 Weiyun1025

Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:

model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()

Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?

nzarif avatar Apr 28 '25 15:04 nzarif

Keep an eye.

Eliza-and-black avatar Jun 05 '25 09:06 Eliza-and-black

Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:

model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True).eval().cuda()

Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?

Hello, did you manage to load the AWQ model of InternVL3 ? Thank you in advance

SamiK1909 avatar Jul 07 '25 14:07 SamiK1909