llm-awq
llm-awq copied to clipboard
Support NVILA 15B
Hi, Do you have plans to support NVILA 15B in this repository? If yes, how soon do you expect it to be ready?
Thank you!
Thank you for your reaching out. As far as I know, the NVILA-15B model is very similar to the 8B model. Maybe you can just try it with TinyChat 2.0.
@Louym
Does tinychat not support multi-GPU load inference?
Thank you for your reaching out. As far as I know, the NVILA-15B model is very similar to the 8B model. Maybe you can just try it with TinyChat 2.0.
Thank you for your question. Since tinychat is mainly focusing on edge AI, we do not support multi-GPU inference for the time being.
@Louym
When running the 8b-video model on a 4090 GPU, I'm getting an OOM (Out of Memory) error. Would it be okay to proceed with AWQ quantization after reducing the num_image_tokens?
Thank you for reaching out. According to my experience, you can run W4A16 quantized NVILA 8B model with no more than 128 frames on a single 4090 GPU.
Thank you for reaching out. According to my experience, you can run W4A16 quantized NVILA 8B model with no more than 128 frames on a single 4090 GPU.
Are you referring to the 8b-video model? I'll modify the configuration to 128 and try applying AWQ ?
@YoungjaeDev Yes, you can have a try!