InternVideo
InternVideo copied to clipboard
Zero-shot retrieval reproduction issue
According to the ReadMe at https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Downstream/Video-Text-Retrieval, the zero-shot retrieval results will be obtained after running the command ./zeroshot_scripts/eval_msrvtt.sh
. This command will execute the main_task_retrieval.py
. But in "main_task_retrieval.py", I find that the model is CLIP4CLIP, instead of ViCLIP. I'd like to know how to conduct zero-shot video-text retrieval experiments with pretrained ViCLIP.
Maybe you need to use the code of Internvideo2.mulitidality and add a model defintion of ViCLIP.