VILA issues

Running the AWQ models

3

Is it possible to run the AWQ models using the `run_vila.py` script? I ran the following command: ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b-AWQ \ --conv-mode vicuna_v1 \ --query...

signine

No module named 'llava.tf_utils'

1

I run this demo script ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b \ --conv-mode vicuna_v1 \ --query "\n Please describe this video." \ --video-file "demo.mp4" ``` got the following...

pribadihcr

working with VLLM

2

I'm wondering if I can get an easier pipeline by loading the awq weights with vllm: ``` from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The...

kousun12

How's the DownSampleBlock performance compare with CAbstractor?

3

How's the DownSampleBlock performance compare with CAbstractor?

lucasjinreal

Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1)

5

Thank you for releasing the new version of VILA (1.5)! I followed the installation instructions at https://github.com/mit-han-lab/llm-awq/tree/main?tab=readme-ov-file#install and ran the command `python vlm_demo_new.py` as detailed here: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#support-visual-language-models-vila-15-vila-llava On Ubuntu 22.04...

gigony

Request for middle checkpoint

3

Thank you for the amazing release! Do you plan to release the checkpoints from different stages, e.g., checkpoint before SFT? These checkpoints would be valuable for further fine-tuning.

jihaonew

Potential bug in mm_utils.py process_image function

1

When `data_args.image_aspect_ratio = 'resize'`, it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has no `shape` attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 When doing stage 1 alignment training, we...

hubenjm

Updated paper on the latest model (video understanding, etc.)

4

Congrats on adding support for video understanding to VILA, looks super cool! Just curious, is there an updated or new paper with more technical details on how improved video understanding...

thecooltechguy

How to evaluate 4shot?

leexinhao

Provide ShareGPT4V filtered annotations file

In datasets_mixture.py there is references a .json file that is not entirely clear where it came from based on the name: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/datasets_mixture.py#L62 Is this file the same as https://huggingface.co/datasets/mit-han-lab/ShareGPT4V/blob/main/filter-share-captioner_coco_lcs_sam_1246k_1107.json? if...

hubenjm

VILA
VILA copied to clipboard

Metadata

Running the AWQ models

No module named 'llava.tf_utils'

working with VLLM

How's the DownSampleBlock performance compare with CAbstractor?

Instruction for VILA 1.5 with tinychat (llm-awq) doesn't work well due to fixed torch version (==2.0.1)

Request for middle checkpoint

Potential bug in mm_utils.py process_image function

Updated paper on the latest model (video understanding, etc.)

How to evaluate 4shot?

Provide ShareGPT4V filtered annotations file

← Metadata

Owner

Metadata

VILA VILA copied to clipboard

Metadata

← Metadata

Owner

Metadata

VILA
VILA copied to clipboard