VILA
VILA copied to clipboard
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Is it possible to run the AWQ models using the `run_vila.py` script? I ran the following command: ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b-AWQ \ --conv-mode vicuna_v1 \ --query...
I run this demo script ``` python -W ignore llava/eval/run_vila.py \ --model-path Efficient-Large-Model/VILA1.5-3b \ --conv-mode vicuna_v1 \ --query "\n Please describe this video." \ --video-file "demo.mp4" ``` got the following...
I'm wondering if I can get an easier pipeline by loading the awq weights with vllm: ``` from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The...
How's the DownSampleBlock performance compare with CAbstractor?
Thank you for releasing the new version of VILA (1.5)! I followed the installation instructions at https://github.com/mit-han-lab/llm-awq/tree/main?tab=readme-ov-file#install and ran the command `python vlm_demo_new.py` as detailed here: https://github.com/mit-han-lab/llm-awq/tree/main/tinychat#support-visual-language-models-vila-15-vila-llava On Ubuntu 22.04...
Thank you for the amazing release! Do you plan to release the checkpoints from different stages, e.g., checkpoint before SFT? These checkpoints would be valuable for further fine-tuning.
When `data_args.image_aspect_ratio = 'resize'`, it seems that mm_utils.process_image returns the image as a PIL.Image.Image data type, which has no `shape` attribute. See https://github.com/Efficient-Large-Model/VILA/blob/main/llava/mm_utils.py#L168 When doing stage 1 alignment training, we...
Congrats on adding support for video understanding to VILA, looks super cool! Just curious, is there an updated or new paper with more technical details on how improved video understanding...
In datasets_mixture.py there is references a .json file that is not entirely clear where it came from based on the name: https://github.com/Efficient-Large-Model/VILA/blob/main/llava/data/datasets_mixture.py#L62 Is this file the same as https://huggingface.co/datasets/mit-han-lab/ShareGPT4V/blob/main/filter-share-captioner_coco_lcs_sam_1246k_1107.json? if...