VILA issues

VILA-1.5-HD coming soon?

1

Hello, will we see VILA "HD" models that support dynamic high resolution ("anyres" padding technique), like we see in LLaVA 1.6? If not, do you have any other recommendations on...

collinmccarthy

ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

12

I just follw the step, but when I run the following code : # Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Efficient-Large-Model/Llama-3-VILA1.5-8B") it raise an error:ValueError: The checkpoint...

eternal8080

Dataset and Training code for Longvila

4

Longvila is a wonderful work. and i wanna how can i get the dataset for stage4,5 in your paper. There also seems to be no mention of stage4, 5 training...

JcWang20

KeyError: 'llava_llama'

1

i am unable to load the model , provide me a code to load the model and use the model for video inference locally i mean i want to use...

RajaAIStarter

Trying to run VILA on triton with triton_llm backend

After successfully creating the engine files I want to deploy the VILA model with triton server. However, it fails because transformers doesn't recognize the model type "llava_llama" ``` [TensorRT-LLM][WARNING] stats_check_period_ms...

dand-milestone

How to run vila with TinyChatEngine with multiple understanding enabled?

TinyChatEngine's vila bash script seems only provide chat executable with one image parameter(https://github.com/mit-han-lab/TinyChatEngine/blob/main/llm/vila_2.7b): ./chat VILA_2.7B INT4 5 $image_path How to use TinyChatEngine to infer vila for multiple understanding problem if...

yg1988

Can VILA do grounding jobs?

1

Hello, I ask vila for giving the bbox of the objects in the photo and vila do reply me the bbox. Then I used code see if it is correct....

PredyDaddy

Plz fix run_vila.py line 65 output variable(s)

Hi! Directly running ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-40b \ --conv-mode hermes-2 \ --query "\n Please describe this video." \ --video-file "demo.mp4" ``` gives `ValueError: too many values...

ziyaosg

ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

5

(VILA) (base) user@ubuntu(125):/data/workspace/zhaoyong/model/VILA$ sh 1.sh [2024-12-18 09:46:02,468] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO: Started server process [2989388] INFO: Waiting for application startup. Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 2/2...

HAOYON-666

Multi-image inference code

2

Thanks the authors for the great work! I am trying to run inference with multiple images as input, but it seems the `run_vila.py` script is no longer available. After checking...

tian1327

VILA
VILA copied to clipboard

Metadata

VILA-1.5-HD coming soon?

ValueError: The checkpoint you are trying to load has model type `llava_llama` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Dataset and Training code for Longvila

KeyError: 'llava_llama'

Trying to run VILA on triton with triton_llm backend

How to run vila with TinyChatEngine with multiple understanding enabled?

Can VILA do grounding jobs?

Plz fix run_vila.py line 65 output variable(s)

Multi-image inference code

← Metadata

Owner

Metadata

VILA VILA copied to clipboard

Metadata

← Metadata

Owner

Metadata

VILA
VILA copied to clipboard