VILA issues

About VILADistributedSampler and gradient_accumulation_steps

If we use the VILADistributedSampler (https://github.com/Efficient-Large-Model/VILA/blob/main/llava/train/llava_trainer.py#L274-L281) for Distributed Training, should the `gradient_accumulation_steps` be hardcoded to 1? Since I notice that when I use 4 nodes (8 GPUs per node) to...

dreamerlin

Access to pretrained model weights

5

Hi, thanks for the great work! May I know if we have access to non-instruct models? (e.g., models after stage-1 or stage-2). For my specific research use case, I would...

zzxslp

create long-video QA samples

3

As mentioned in the paper (see Fig. 3 on page 3), the question-answer pairs were generated by a Large Language Model (LLM). What prompts were used by the LLM to...

peiliu0408

Data preparation for Stage 4 and Stage 5 in LONGVILA

Hi, Thanks for releasing the fantastic works. Will you release the scripts or instructions for the datasets used at Stages 4 and 5 in LONGVILA (e.g., new dataset from Shot2Story)?...

GenjiB

LongVILA - compatibility with other LLMs

1

Hi! very impressed by your work with LongVILA. I would like to do long context with Qwen/LLaMA3.1, but currently, i only see support for Mixtral. Any chance you plan on...

orrzohar

Finetuning

7

Is it possible to finetune VILA through hugging face with a custom image dataset? I don't see any documentation about this.

RohanR04

[HELP] Do we have any docker image for Jetson platform ?

2

Hi Yaolug We have been working on how to train VILA in Jetson Orin and it's turns out that I have updated my ORIN to Jetpack 6 enviroment. I don't...

lenoardshannon

OpenVLM leaderboard

3

Do you have any plan to get involved in OpenVLM leaderboard? https://huggingface.co/spaces/opencompass/open_vlm_leaderboard I think that needs some efforts from your side, but given the performance of VILA provides you good...

oroojlooy

[Help] Using VILA1.5-40b model for Video Descriptions

1

Hi, I am trying to use the VILA1.5-40b model for doing video inference. Below is the prompt I am using: ```bash python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-40b \ --conv-mode...

SidPad03

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model

8

Hi, I am encountering an issue when running inference on the Llama-3-VILA1.5-8B model. The error message I receive is: ```RuntimeError: FlashAttention only supports Ampere GPUs or newer.``` I am using...

vedernikovphoto

VILA
VILA copied to clipboard

Metadata

About VILADistributedSampler and gradient_accumulation_steps

Access to pretrained model weights

create long-video QA samples

Data preparation for Stage 4 and Stage 5 in LONGVILA

LongVILA - compatibility with other LLMs

Finetuning

[HELP] Do we have any docker image for Jetson platform ?

OpenVLM leaderboard

[Help] Using VILA1.5-40b model for Video Descriptions

Issue with Flash Attention on V100 GPU for Llama-3-VILA1.5-8B Model

← Metadata

Owner

Metadata

VILA VILA copied to clipboard

Metadata

← Metadata

Owner

Metadata

VILA
VILA copied to clipboard