VILA icon indicating copy to clipboard operation
VILA copied to clipboard

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Results 141 VILA issues
Sort by recently updated
recently updated
newest added

I noticed that NVILA has three versions: Base, Lite, and Video. What are the differences between them, and how does NVILA-15B perform in video tasks, such as the test results...

I have seen from a previous issue, that it was able to reason among multiple images (see: https://github.com/NVlabs/VILA/issues/20) I wanted to try this with vila-infer aswell, however, if I use...

I am running inference with the Efficient-Large-Model/VILA1.5-13b model. When using the Efficient-Large-Model/VILA1.5-3b and Efficient-Large-Model/Llama-3-VILA1.5-8B models, the results are generated correctly without any issues. However, when running inference with the 13B...

[2024-12-18 17:36:31,349] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO: Started server process [3865832] INFO: Waiting for application startup. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:01

Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference? Additionally, I noted that the \ token seems invalid in video inference. The eval codes will automatically add...

For quantizing the llm part of VILA, I would like to know why AWQ was chosen instead of GPTQ. Have you tried using GPTQ to quantize the LLM part? AWQ...

![image](https://github.com/user-attachments/assets/c87d29ad-0204-437b-bfd3-3a26191d40ba)

Argument order is different in LLaVA's function, so I updated it so that it doesn't matter which order the arguments are in.

I've encountered a persistent issue while running the Gradio demo: "Gradio demo: VILA with TinyChat" on a local server, despite following the steps here: [GitHub Link](https://github.com/mit-han-lab/llm-awq/tree/main/tinychat/serve). **Problem:** The model fails...

I want to start training my own fine-tuning dataset from the stage 2 of VILA1.5-3b. I noticed in `3_sft.sh` that there is a comment for the output of the stage...