VILA icon indicating copy to clipboard operation
VILA copied to clipboard

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Results 141 VILA issues
Sort by recently updated
recently updated
newest added

Hi, Thank you for your outstanding work! Without a doubt, your recently published VILA v1.5 series pushes the boundaries of multimodal large language models. It is arguably the most powerful...

I noticed a bug in the data sampler. In the original implementation, the same elements will be dropped in every epoch. For example, assume the dataset size is 900, and...

I'm trying to get fine-tuning working through the 3_sft.sh script but am encountering an error: ``` Traceback (most recent call last): File "/root/VILA/llava/train/train_mem.py", line 36, in train() File "/root/VILA/llava/train/train.py", line...

Please provide a script to run the VILA1.5-40b int4 quantized model. like this:

Hello, Thank you for the amazing work you’ve done on this project. I’m particularly interested in the upcoming VILA2 model and its associated code. Could you please share any information...

There are multiple mentions of a multi modal sequence parallel system for inference which can be seamlessly integrated with HF transformers. However, I am not able to follow this through...

Hello, I am trying to run the VILA model for inference, but I have encountered a couple of issues that I need help with. (1)FlashAttention Issue: Initially, I faced a...

Great works and research. My question is simply if is it possible to use only the visual/video part (already pretrained on video dataset like kinetics) for fine-tuning on long video...

Hello, I'm new to LLM serving and multi-modal LLMs. I'm looking for similar examples for the LongVILA model, like the one for VILA1.5 models: ``` python -W ignore llava/eval/run_vila.py --model-path...

I found that the dataset like Efficient-Large-Model/sherlock_317K could not be downloaded now, and I got 404 when I enter it in huggingface's datasets.