InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

Similarity Scores coming very low between the video and the text features.

Open rishabh-akridata opened this issue 10 months ago • 2 comments

Hi @leexinhao, I am trying the text to video retrieval on my dataset using this https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/demo_video_text_retrieval.ipynb, but the similarity scores are coming in very low between the text_features and the video_features. I am using this weight file InternVideo2-stage2_1b-224p-f4.pt and the cosine similarity I am computing by taking the dot product between the text_features and the video_features(text_features @ video_features.T). array([0.08640765, 0.08618326, 0.08596011, 0.08578135, 0.08574679, 0.08564241, 0.08557957, 0.08552065, 0.08551717, 0.08548111], dtype=float32)

Thanks.

rishabh-akridata avatar Feb 20 '25 09:02 rishabh-akridata

I solved this bug. The reason was that the model weights were not loaded correctly. Make sure that the "pretrained_path" in internvideo2_stage2_config.py is correctly assigned. This point is not mentioned in the repo's DEMO_USAGE_GUIDE.

UnableToUseGit avatar Mar 01 '25 02:03 UnableToUseGit

I solved this bug. The reason was that the model weights were not loaded correctly. Make sure that the "pretrained_path" in internvideo2_stage2_config.py is correctly assigned. This point is not mentioned in the repo's DEMO_USAGE_GUIDE.

Thanks a lot!

MxLearner avatar Apr 24 '25 08:04 MxLearner