VILA icon indicating copy to clipboard operation
VILA copied to clipboard

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Results 141 VILA issues
Sort by recently updated
recently updated
newest added

Hello author, Is there a tool or method to deploy VILA to **mobile phones**? Looking forward to hearing from you!

Running `docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 vila:latest` leads to the below error ============= == PyTorch == ============= NVIDIA Release 24.06 (build 96418707) PyTorch Version 2.4.0a0+f70bd71 Container...

Really appreciate for this project! I wonder how to get a Vision Language Action model (for robotic manipulation and navigation) from base models.

Hi Nice work! But it is hard to follow if you don't give transformers library support. I am not a skilled code-builder to coding the training model code and scripts....

I'm launching the VILA1.5-3B server with the following command: `python -W ignore server.py \ --port 8000 \ --model-path Efficient-Large-Model/VILA1.5-3B \ --conv-mode vicuna_v1` The server starts successfully without visible errors. However,...

**Issue Category**: Model Performance & Configuration **Detailed Description**: ### Current Setup - **Infrastructure**: GPU-supported EC2 instances - **Implementation**: FastAPI wrapper on top of VILA inference command - **Problem**: Significant performance...

## 📝 Issue Description When attempting to modify inference hyperparameters like `temperature`, `top_p`, `max_new_tokens`, and other generation parameters in the `vila-infer` command, the system doesn't expose these critical parameters through...

I fine-tuned `Efficient-Large-Model/NVILA-Lite-8B` enabling LoRA and got the model checkpoint as below. I want to 1) load the saved model and 2) finetune the model from the saved model checkpoint....

I read the instructions https://github.com/NVlabs/VILA/tree/main/finetuning but it only shows how fine-tune with single image-QA set. As NVILA can take multiple images as input for inference, would it be possible to...

I prepared and registered shot2story data as detailed here: https://github.com/NVlabs/VILA/tree/main/finetuning . When I try to run https://github.com/NVlabs/VILA/blob/main/longvila/train/5_long_sft_256frames.sh almost exactly as is but on one H100 node with 8 gpus (modified...