VILA
VILA copied to clipboard
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Hi! Really appreciate your great work. I'm a bit confused of the padding_direction being set in LLaMA3's `tokenizer.json` file. As said in the comments, this is used in the model's...
When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error: "ValueError:...
Hi, Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'? Best, Orr
This is helpful for researchers using VILA without additional code updates, ignore the misc files created after installation.
This is the output of the model. python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/Llama-3-VILA1.5-8b \ --conv-mode llama_3 \ --query "\n Please describe the traffic condition." \ --image-file "demo_images/av.png" This is...
AttributeError: 'Image' object has no attribute 'shape'
What is the conv model for 3B VILA 1.5 ?
Hello, and thanks for such a great contribution to the field of interleaved LMMs! This is really great work. I was wondering if there was an example of the format...
Congratulations on the VILA release!! The Demo web server seems to be down currently. It would be great to have the demo up on Huggingface Spaces as well. We'd be...
@Lyken17 I would like to know which model is used for the LLM of the VILA1.5-3B model. I have not found a Llama model with 3B parameter scale.