VILA icon indicating copy to clipboard operation
VILA copied to clipboard

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Results 141 VILA issues
Sort by recently updated
recently updated
newest added

Hi, When you do Sequence Paralle -- you are padding with token id 2 = '#' https://github.com/NVlabs/VILA/blob/2b43308f25e63161a172fe9a38e3a04e2fcd12ef/llava/data/dataset.py#L1372-L1389 Could you let me know why you are padding with this instead of...

The current DataCollatorForSupervisedDatasetSeqParallel in llava/data/dataset.py is built for image dataset. There will be many errors when directly using it for video dataset. Will you release the similar solution for video...

I want to train a multimodal video understanding model. What should I do? I find the NVILA-15B model supports video inference.

Hello, Author. When I changed the question in the "searching for a needle in the haystack" evaluation from one about the needle to a different question (for example, "please describe...

When I evaluated NVILA-8B-Video on lmms-longvideobench with this script: ```bash #!/bin/bash set -e MODEL_NAMES=( "NVILA-8B-Video" ) SELECTED_TASKS=( "lmms-longvideobench_val_v" ) TASK_STR=$( IFS=, echo "${SELECTED_TASKS[*]}" ) echo "TASK_STR: $TASK_STR" START_TIME=$(date +%s) echo...

Replacing `+=` with `text_embeds = text_embeds + (...)` avoids the "leaf Variable that requires grad is being used in an in-place operation" RuntimeError in PyTorch.

Hello Author, I have a question regarding the understanding of the code. In the eval_forward function, I noticed that the code concatenates answer_embeds with input_embeds and then feeds the combined...

When using LongViLa-R1 for video summarization, I encountered an issue where one video chunk took an abnormally long time to process, resulting in a large summary with significant repetition. Model:...

Hello VILA team! First, thank you for open-sourcing this incredible family of Vision Language Models! The work on VILA, NVILA, and is truly impressive, and the focus on efficiency and...