llm-awq Support NVILA 15B

Support NVILA 15B

Open anhnhust opened this issue 11 months ago • 7 comments

Hi, Do you have plans to support NVILA 15B in this repository? If yes, how soon do you expect it to be ready?

Thank you!

Dec 24 '24 09:12 anhnhust

Thank you for your reaching out. As far as I know, the NVILA-15B model is very similar to the 8B model. Maybe you can just try it with TinyChat 2.0.

Jan 07 '25 12:01 Louym

@Louym

Does tinychat not support multi-GPU load inference?

Feb 04 '25 09:02 YoungjaeDev

Thank you for your reaching out. As far as I know, the NVILA-15B model is very similar to the 8B model. Maybe you can just try it with TinyChat 2.0.

Thank you for your question. Since tinychat is mainly focusing on edge AI, we do not support multi-GPU inference for the time being.

Feb 04 '25 10:02 Louym

@Louym

When running the 8b-video model on a 4090 GPU, I'm getting an OOM (Out of Memory) error. Would it be okay to proceed with AWQ quantization after reducing the num_image_tokens?

Feb 04 '25 10:02 YoungjaeDev

Thank you for reaching out. According to my experience, you can run W4A16 quantized NVILA 8B model with no more than 128 frames on a single 4090 GPU.

Feb 04 '25 11:02 Louym

Thank you for reaching out. According to my experience, you can run W4A16 quantized NVILA 8B model with no more than 128 frames on a single 4090 GPU.

Are you referring to the 8b-video model? I'll modify the configuration to 128 and try applying AWQ ?

Feb 04 '25 13:02 YoungjaeDev

@YoungjaeDev Yes, you can have a try!

Feb 06 '25 03:02 Louym

llm-awq llm-awq copied to clipboard

Support NVILA 15B

llm-awq
llm-awq copied to clipboard