VILA issues

Why setting LLaMa3's padding direction to "right"?

2

Hi! Really appreciate your great work. I'm a bit confused of the padding_direction being set in LLaMA3's `tokenizer.json` file. As said in the comments, this is used in the model's...

ROIM1998

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

5

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error: "ValueError:...

averypfeiffer

Question re. LanguageModel vs LanguageModelForCausalLM functionalies

2

Hi, Why did you refactor such that the model is of type 'LanguageModel' and not 'LanguageModelForCausalLM'? and why did you move 'get_vision_tower'/etc from 'LlavaMetaForCausalLM' to 'LlavaMetaModel'? Best, Orr

orrzohar

Add .gitignore

This is helpful for researchers using VILA without additional code updates, ignore the misc files created after installation.

zzxslp

Question about the output

4

This is the output of the model. python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/Llama-3-VILA1.5-8b \ --conv-mode llama_3 \ --query "\n Please describe the traffic condition." \ --image-file "demo_images/av.png" This is...

DwanZhang-AI

AttributeError: 'Image' object has no attribute 'shape'

6

AttributeError: 'Image' object has no attribute 'shape'

ZackBradshaw

What is the conv_mode for VILA-1.5-3B ?

1

What is the conv model for 3B VILA 1.5 ?

amitbcp

Multi-Image or Multi-Video Inference Example

2

Hello, and thanks for such a great contribution to the field of interleaved LMMs! This is really great work. I was wondering if there was an example of the format...

chancharikmitra

Demo on Huggingface Spaces

7

Congratulations on the VILA release!! The Demo web server seems to be down currently. It would be great to have the demo up on Huggingface Spaces as well. We'd be...

yvrjsharma

About the VILA1.5 3b

2

@Lyken17 I would like to know which model is used for the LLM of the VILA1.5-3B model. I have not found a Llama model with 3B parameter scale.

Davidup1

VILA
VILA copied to clipboard

Metadata

Why setting LLaMa3's padding direction to "right"?

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

Question re. LanguageModel vs LanguageModelForCausalLM functionalies

Add .gitignore

Question about the output

AttributeError: 'Image' object has no attribute 'shape'

What is the conv_mode for VILA-1.5-3B ?

Multi-Image or Multi-Video Inference Example

Demo on Huggingface Spaces

About the VILA1.5 3b

← Metadata

Owner

Metadata

VILA VILA copied to clipboard

Metadata

← Metadata

Owner

Metadata

VILA
VILA copied to clipboard