Video-LLaMA icon indicating copy to clipboard operation
Video-LLaMA copied to clipboard

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Results 58 Video-LLaMA issues
Sort by recently updated
recently updated
newest added

Hello, I am trying to run the Video-Llama demo file and I am setting up the environment on my linux machine. These are the steps I am doing: 1) Cloning...

Hello, Thank you for your amazing work. The demo runs fine for a single video. I'm curious if there are any provisions for generating inference on a larger dataset of...

Is video-LLaMA capable of comprehending videos that have faces surrounded by bounding boxes(face recognition)? If I asked video-LLaMA a question to descirbe what each person in a video us doing...

需要多大的内存呢,25G的内存都不够加载pytorch_model的

Hello! First of all, I'd like to congratulate you on your great work. I have a question: I'm looking to evaluate the model's performance in a different way by using...

使用video_llama_eval.yaml和demo_video.py,配置如下: ckpt:finetune-billa7b-zh.pth llama_proj_model:pretrained_minigpt4.pth llama_model:Neutralzz/BiLLa-7B-LLM 报错如下图:

您好,请问按照给定的config在下游任务上进行Finetune效果不太好可能是什么原因?是否需要引入lora等方式增加可学习参数的数量以提升模型在下游任务上的效果呢? 感谢您的解答!

Hello, Thanks for the gradio example. But I wonder if there are examples of reading in video file and then Q&A in command line without using the gradio example since...

![video-llama](https://github.com/DAMO-NLP-SG/Video-LLaMA/assets/56279639/166e7697-f554-41cf-a68b-1f635e102969) Hello, 1, I have set up Video-LLaMA from this repo. I have downloaded all checkpoints for inference: - I am using VL_LLAMA_2_7B_Finetuned.pth and llama-2-7b-chat-hf from this [Hugging Face repo](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-2-7B-Finetuned/tree/main)....

I tried to perform inference on an x1 GPU 4090 with 24GB, it worked. Now, I am trying to train this model and reduce the GPU memory usage as much...