Similarity Scores coming very low between the video and the text features.
Hi @leexinhao,
I am trying the text to video retrieval on my dataset using this https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/demo_video_text_retrieval.ipynb, but the similarity scores are coming in very low between the text_features and the video_features. I am using this weight file InternVideo2-stage2_1b-224p-f4.pt and the cosine similarity I am computing by taking the dot product between the text_features and the video_features(text_features @ video_features.T).
array([0.08640765, 0.08618326, 0.08596011, 0.08578135, 0.08574679, 0.08564241, 0.08557957, 0.08552065, 0.08551717, 0.08548111], dtype=float32)
Thanks.
I solved this bug. The reason was that the model weights were not loaded correctly. Make sure that the "pretrained_path" in internvideo2_stage2_config.py is correctly assigned. This point is not mentioned in the repo's DEMO_USAGE_GUIDE.
I solved this bug. The reason was that the model weights were not loaded correctly. Make sure that the "pretrained_path" in internvideo2_stage2_config.py is correctly assigned. This point is not mentioned in the repo's DEMO_USAGE_GUIDE.
Thanks a lot!