text-generation-inference
text-generation-inference copied to clipboard
[New Model Request] NVLM
Model description
I'm creating this issue to gauge how interested people are in having the NVLM model added to TGI. If you would like to see it added, please add an emoji to this message.
Here is the announcement from Nvidia on the model card:
Today (September 17th, 2024), we introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training.
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
https://huggingface.co/nvidia/NVLM-D-72B