Video-LLaVA
Video-LLaVA copied to clipboard
Relation of Video-LLaVA and LanguageBind
Excellent job!
I have three questions that are not clear to me.
- I have some problems with Relation of Video-LLaVA and LanguageBind. Has Video-LLaVa use video encoder of LanguageBind?
- Now I have a fMRI(a modality) encoder. If I want to do contrastive learning between fMRI, video and corresponding caption(use Video-LLaVA to get caption of video), should I do it based on LanguageBind?
- Does Video-LLaVA have any requirements for the number of video frames?
For 1, yes. For 2, you can try both two ways to get the best choice. For 3, it depends on video encoder, which only support 8 frames now.