Internvl3 awq quantization model
The performance of Internvl3 is excellent. May I ask when a quantized model, such as AWQ, will be released for deployment? Thx!
Thank you for your interest in our work. We release the quantization model here.
Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True).eval().cuda()
Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?
Keep an eye.
Thank you for providing the AWQ model. Can you please check the model card? The instructions on the model card for using the model seem not to be up-to-date for example trying to load the model like this fails:
model = AutoModel.from_pretrained( path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, use_flash_attn=True, trust_remote_code=True).eval().cuda()Can the AWQ model be used for transformers library? For both loading the model and doing inference. Also, is the chat template for the AWQ model same as the base model?
Hello, did you manage to load the AWQ model of InternVL3 ? Thank you in advance