Video-LLaMA icon indicating copy to clipboard operation
Video-LLaMA copied to clipboard

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Results 58 Video-LLaMA issues
Sort by recently updated
recently updated
newest added

Prevent variable "atts_img" referred before assignment error on training script on README page.

added `pytorchvideo` under pip to resolve `ResolvePackageNotFound`

Great project ! I would like to ask 3 questions to learn: 1.Does your public checkpoint include the parameters of the 2-layer Q-former and the linear projection layer? 2.Seeing that...

I'm also training this... I haven't downloaded webvid2.5m yet and then I found that you have done everything I want to do, hahahaha

Using pad_token, but it is not set yet. ![image](https://github.com/DAMO-NLP-SG/Video-LLaMA/assets/49881437/8e689d08-0605-4a32-b642-b36cccd01988)

hi, authors, I want to use Video-LLaMA to infer my own dataset, I find that the current framework supports the max number of input frames as 32, if I change...

Hi, I'm wondering what is the input **sample** of the forward function in videollama.py. It seems like an dict() which contains **image**, **text_input** as its keys, but I can't find...

Hi, do you think the following could be bugs in the lr scheduler? 1. https://github.com/DAMO-NLP-SG/Video-LLaMA/blob/ae12557b8510a7cc94baa3d3aea58ea07f6de76a/video_llama/common/optims.py#L83 should be `step=total_cur_step,`? 2. https://github.com/DAMO-NLP-SG/Video-LLaMA/blob/ae12557b8510a7cc94baa3d3aea58ea07f6de76a/video_llama/common/optims.py#L91 should be `epoch=total_cur_step - self.warmup_steps,`? 3. https://github.com/DAMO-NLP-SG/Video-LLaMA/blob/ae12557b8510a7cc94baa3d3aea58ea07f6de76a/video_llama/common/optims.py#L93 should be `max_epoch=self.max_epoch...