VILA icon indicating copy to clipboard operation
VILA copied to clipboard

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Results 141 VILA issues
Sort by recently updated
recently updated
newest added

the image size of inputs are different, i got the error below when using the dynamic_s2 preprocess method: RuntimeError: stack expects each tensor to be equal size, but got [2560,...

Hi together, I tried to fine-tune the model to my dataset. To do this, I first tried the dummy dataset to get familiar with it. I executed the following command....

(vila) kirdo@kirdo-System-Product-Name:~/LLM/llm-awq$ python -m awq.entry --model_path /home/kirdo/LLM/NVILA-8B-Video/ --w_bit 4 --q_group_size 128 --run_awq --dump_awq awq_cache/$MODEL-w4-g128.pt Quantization config: {'zero_point': True, 'q_group_size': 128} * Building model /home/kirdo/LLM/NVILA-8B-Video/ [2024-12-30 19:26:13,027] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator...

Hi, thank you for your cool work! After looking at your code, I wanted to know if I understood your use of the conversation template correctly. The templates found in...

I follow your instruction by running: 1. docker build -t vila-server:latest . 2. docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ -v ./hub:/root/.cache/huggingface/hub \ -it --rm -p 8000:8000...

Need steps for the custom dataset registration. **Query:** bash scripts/NVILA-Lite/sft.sh runs/train/NVILA-Lite-8B-stage2 "alias to data" where "alias to data" is /home/sample_ft/M3IT/data/captioning/coco/captioning_coco_train.pkl **Error:** 2024-12-30 11:13:46.201 | INFO | llava.data.builder:register_datasets:39 - Registering datasets...

Following repo cloned: git clone https://huggingface.co/Efficient-Large-Model/NVILA-15B We have used a custom dataset and preprocessed it to create a .pkl file which will be used for fine-tuning. Post that we executed...

We are trying to fine tune the latest NVILA-15B model. We are using only coco (M3IT/data/captioning/coco) dataset as a reference to create our custom dataset and preprocessed using script (python...

I'd like to ask the base LLM of the following LongVILA checkpoint: - `Efficient-Large-Model/Llama-3-LongVILA-8B-128Frames` - `Efficient-Large-Model/Llama-3-LongVILA-8B-256Frames` - `Efficient-Large-Model/Llama-3-LongVILA-8B-512Frames` This was named with `Llama-3`, however, as quote from the paper: It...