Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

Relation of Video-LLaVA and LanguageBind

Open song-wensong opened this issue 1 year ago • 1 comments

Excellent job!

I have three questions that are not clear to me.

  1. I have some problems with Relation of Video-LLaVA and LanguageBind. Has Video-LLaVa use video encoder of LanguageBind?
  2. Now I have a fMRI(a modality) encoder. If I want to do contrastive learning between fMRI, video and corresponding caption(use Video-LLaVA to get caption of video), should I do it based on LanguageBind?
  3. Does Video-LLaVA have any requirements for the number of video frames?

song-wensong avatar Feb 07 '24 14:02 song-wensong

For 1, yes. For 2, you can try both two ways to get the best choice. For 3, it depends on video encoder, which only support 8 frames now.

LinB203 avatar Feb 07 '24 15:02 LinB203