How to get video features in VideoCLIP without any access to captions
For VideoCLIP, how do we get the video encoder features without access to any captions. The default code at https://github.com/facebookresearch/fairseq/blob/main/examples/MMPT/README.md results in MMBertForEncoder as the final video encoder and it requires input_ids which in turn are based on caps. How do we work around this?
Hi, I'm also working on this problem. It seems like the MMPTModel.model is one of the subclass of MMFusionShare and you can use the MMPTModel.model.forward_video directly when having only video or use ***.forward_text when only have text.
Hi, did any of you end up getting the VideoCLIP example to work? Could you please share your package versions and stuff? I can't get it to run.