How to get video features in VideoCLIP without any access to captions

Open learn2phoenix opened this issue 1 year ago • 2 comments

For VideoCLIP, how do we get the video encoder features without access to any captions. The default code at https://github.com/facebookresearch/fairseq/blob/main/examples/MMPT/README.md results in MMBertForEncoder as the final video encoder and it requires input_ids which in turn are based on caps. How do we work around this?

Jan 20 '24 14:01 learn2phoenix

Hi, I'm also working on this problem. It seems like the MMPTModel.model is one of the subclass of MMFusionShare and you can use the MMPTModel.model.forward_video directly when having only video or use ***.forward_text when only have text.

Feb 25 '24 21:02 MarkChenYutian

Hi, did any of you end up getting the VideoCLIP example to work? Could you please share your package versions and stuff? I can't get it to run.

Sep 05 '24 22:09 qingy1337