fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

How to get video features in VideoCLIP without any access to captions

Open learn2phoenix opened this issue 1 year ago • 2 comments

For VideoCLIP, how do we get the video encoder features without access to any captions. The default code at https://github.com/facebookresearch/fairseq/blob/main/examples/MMPT/README.md results in MMBertForEncoder as the final video encoder and it requires input_ids which in turn are based on caps. How do we work around this?

learn2phoenix avatar Jan 20 '24 14:01 learn2phoenix

Hi, I'm also working on this problem. It seems like the MMPTModel.model is one of the subclass of MMFusionShare and you can use the MMPTModel.model.forward_video directly when having only video or use ***.forward_text when only have text.

Screenshot 2024-02-25 at 4 24 53 PM

MarkChenYutian avatar Feb 25 '24 21:02 MarkChenYutian

Hi, did any of you end up getting the VideoCLIP example to work? Could you please share your package versions and stuff? I can't get it to run.

qingy1337 avatar Sep 05 '24 22:09 qingy1337