VILA issues

Different versions of NVILA

1

I noticed that NVILA has three versions: Base, Lite, and Video. What are the differences between them, and how does NVILA-15B perform in video tasks, such as the test results...

Yanyx44

Use vila-infer to reason among multiple images

1

I have seen from a previous issue, that it was able to reason among multiple images (see: https://github.com/NVlabs/VILA/issues/20) I wanted to try this with vila-infer aswell, however, if I use...

Hetznero

Efficient-Large-Model/VILA1.5-13b weird outputs

3

I am running inference with the Efficient-Large-Model/VILA1.5-13b model. When using the Efficient-Large-Model/VILA1.5-3b and Efficient-Large-Model/Llama-3-VILA1.5-8B models, the results are generated correctly without any issues. However, when running inference with the 13B...

jhjangjh

I can correctly obtain reasoning results using this code：“vila-infer \ --model-path /data/workspace/zhaoyong/model/weight_files/VILA1.5-3B \ --conv-mode vicuna_v1 \ --text "Please describe the video." \ --media /data/workspace/zhaoyong/data/安全帽.mp4”, but I get an error when using this code：“python -W ignore server.py \ --port 8000 \ --model_path /data/workspace/zhaoyong/model/weight_files/VILA1.5-3B \ --conv_mode vicuna_v1”. Why is that? How should I solve it?

[2024-12-18 17:36:31,349] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) INFO: Started server process [3865832] INFO: Waiting for application startup. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:01

HAOYON-666

What is the conv_mode for VILA1.5-40b in video inference?

2

Hi, I wonder what is the conv_mode for VILA1.5-40b in video inference? Additionally, I noted that the \ token seems invalid in video inference. The eval codes will automatically add...

stdKonjac

Issues with the effectiveness of W4A16 quantization using AWQ

1

For quantizing the llm part of VILA, I would like to know why AWQ was chosen instead of GPTQ. Have you tried using GPTQ to quantize the LLM part? AWQ...

RanchiZhao

Issue: The size of tensor a (2) must match the size of tensor b (8) at non-singleton dimension 0

2

![image](https://github.com/user-attachments/assets/c87d29ad-0204-437b-bfd3-3a26191d40ba)

apfsds3bm9

Fixed argument order so that load_pretrained_model() works

Argument order is different in LLaVA's function, so I updated it so that it doesn't matter which order the arguments are in.

Silverasdf

Unable to run Gradio demo: VILA with TinyChat

1

I've encountered a persistent issue while running the Gradio demo: "Gradio demo: VILA with TinyChat" on a local server, despite following the steps here: [GitHub Link](https://github.com/mit-han-lab/llm-awq/tree/main/tinychat/serve). **Problem:** The model fails...

mitraavi

How to get the stage 2 checkpoint path for 3_sft.sh

5

I want to start training my own fine-tuning dataset from the stage 2 of VILA1.5-3b. I noticed in `3_sft.sh` that there is a comment for the output of the stage...

Qnancy

VILA
VILA copied to clipboard

Metadata

Different versions of NVILA

Use vila-infer to reason among multiple images

Efficient-Large-Model/VILA1.5-13b weird outputs

What is the conv_mode for VILA1.5-40b in video inference?

Issues with the effectiveness of W4A16 quantization using AWQ

Issue: The size of tensor a (2) must match the size of tensor b (8) at non-singleton dimension 0

Fixed argument order so that load_pretrained_model() works

Unable to run Gradio demo: VILA with TinyChat

How to get the stage 2 checkpoint path for 3_sft.sh

← Metadata

Owner

Metadata

VILA VILA copied to clipboard

Metadata

← Metadata

Owner

Metadata

VILA
VILA copied to clipboard