Video-LLaMA
Video-LLaMA copied to clipboard
What is the input sample of the forward function in videollama
Hi, I'm wondering what is the input sample of the forward function in videollama.py.
It seems like an dict() which contains image, text_input as its keys, but I can't find any usage as example. Besides, I check the inference process in demo_audiovideo.py, it's different with the forward process. Can you provide some example to use the forward function in videollama? Thank you very much!
I am also finding this solution.!