VILA
VILA copied to clipboard
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Hello, will we see VILA "HD" models that support dynamic high resolution ("anyres" padding technique), like we see in LLaVA 1.6? If not, do you have any other recommendations on...
I just follw the step, but when I run the following code : # Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-VILA1.5-8B") it raise an error:ValueError: The checkpoint...
Longvila is a wonderful work. and i wanna how can i get the dataset for stage4,5 in your paper. There also seems to be no mention of stage4, 5 training...
i am unable to load the model , provide me a code to load the model and use the model for video inference locally i mean i want to use...
After successfully creating the engine files I want to deploy the VILA model with triton server. However, it fails because transformers doesn't recognize the model type "llava_llama" ``` [TensorRT-LLM][WARNING] stats_check_period_ms...
TinyChatEngine's vila bash script seems only provide chat executable with one image parameter(https://github.com/mit-han-lab/TinyChatEngine/blob/main/llm/vila_2.7b): ./chat VILA_2.7B INT4 5 $image_path How to use TinyChatEngine to infer vila for multiple understanding problem if...
Hello, I ask vila for giving the bbox of the objects in the photo and vila do reply me the bbox. Then I used code see if it is correct....
Hi! Directly running ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-40b \ --conv-mode hermes-2 \ --query "\n Please describe this video." \ --video-file "demo.mp4" ``` gives `ValueError: too many values...
(VILA) (base) user@ubuntu(125):/data/workspace/zhaoyong/model/VILA$ sh 1.sh [2024-12-18 09:46:02,468] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO: Started server process [2989388] INFO: Waiting for application startup. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2...
Thanks the authors for the great work! I am trying to run inference with multiple images as input, but it seems the `run_vila.py` script is no longer available. After checking...