VILA
VILA copied to clipboard
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Hello author, Is there a tool or method to deploy VILA to **mobile phones**? Looking forward to hearing from you!
Running `docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 vila:latest` leads to the below error ============= == PyTorch == ============= NVIDIA Release 24.06 (build 96418707) PyTorch Version 2.4.0a0+f70bd71 Container...
Really appreciate for this project! I wonder how to get a Vision Language Action model (for robotic manipulation and navigation) from base models.
Hi Nice work! But it is hard to follow if you don't give transformers library support. I am not a skilled code-builder to coding the training model code and scripts....
I'm launching the VILA1.5-3B server with the following command: `python -W ignore server.py \ --port 8000 \ --model-path Efficient-Large-Model/VILA1.5-3B \ --conv-mode vicuna_v1` The server starts successfully without visible errors. However,...
**Issue Category**: Model Performance & Configuration **Detailed Description**: ### Current Setup - **Infrastructure**: GPU-supported EC2 instances - **Implementation**: FastAPI wrapper on top of VILA inference command - **Problem**: Significant performance...
## 📝 Issue Description When attempting to modify inference hyperparameters like `temperature`, `top_p`, `max_new_tokens`, and other generation parameters in the `vila-infer` command, the system doesn't expose these critical parameters through...
I fine-tuned `Efficient-Large-Model/NVILA-Lite-8B` enabling LoRA and got the model checkpoint as below. I want to 1) load the saved model and 2) finetune the model from the saved model checkpoint....
I read the instructions https://github.com/NVlabs/VILA/tree/main/finetuning but it only shows how fine-tune with single image-QA set. As NVILA can take multiple images as input for inference, would it be possible to...
I prepared and registered shot2story data as detailed here: https://github.com/NVlabs/VILA/tree/main/finetuning . When I try to run https://github.com/NVlabs/VILA/blob/main/longvila/train/5_long_sft_256frames.sh almost exactly as is but on one H100 node with 8 gpus (modified...