Video-LLaMA
Video-LLaMA copied to clipboard
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
感谢你们的开源工作! 想请问一下用8xA100训练和微调各自大概要多久呢?
In Video-LLaMA, we notice that you load LlamaForCausalLM from ./models/modelling_llama.py. I wonder why not directly load it by "from transformers import LlamaForCausalLM". Do you make any change of the original...
My specs are: ``` GPU 0: NVIDIA A100-PCIE-40GB MEM: 60 GB ``` My config file looks like: ``` model: arch: video_llama model_type: pretrain_vicuna freeze_vit: True freeze_qformer: True max_txt_len: 512 end_sym:...
This is fixed by extracting the audio from the input video and save to a `wav` file by the `ffmpeg-python` package. Fixes #163
There seems to be a bug in the function `upload_video()` in the class `Chat` in file `video_llama/conversation/conversation_video.py`. On the 255 line of `conversation_video.py`, you directly pass the `video_path` to the...
can you tell me how to use LoRA or QLoRA to finetune this model? moreover how to load the entire model from huggingface?
Hi , Thank you very much for your great work I encountered some problems while using finetune-billa7b-zh model for inference. The configuration is as follows: ``` model: arch: video_llama model_type:...
本地配置了本地的模型地址,还是提示无法连接远程下载 配置的文件如下 model: arch: video_llama model_type: pretrain_vicuna freeze_vit: True freeze_qformer: True max_txt_len: 512 end_sym: "###" low_resource: False frozen_llama_proj: False # If you want use LLaMA-2-chat, # some ckpts could be...
Thanks for your contributions! It would be nice if you please let me you if you are going to release Video-LLaMA checkpoints with LLaMA 3.1 anytime soon. Thanks, Shraman
Hi, thank you so much for this work! I was wondering if there any API to run inference on some custom video-audio question answering dataset? Also I wanted to confirm...