Recall (R@k) way lower than the one obtained in papers

Open asczdvefb42 opened this issue 6 months ago • 1 comments

Here is what i do :

Get test dataset of image - caption (IIITD-20K dataset)
calculate embeddings with my fine tuned CLIP
calculate cosine distance between each text to all images
get the k closest images to the text, if the corresponding image is in it, do +1 to score
get the recall by dividing the score by the length of the test dataset.

This is my recall at k. I obtain a R@1 of 17%, while most papers when finetuning CLIP obtain at least 60% recall at 1. Any idea what i could be doing wrong?

Jun 11 '25 14:06 asczdvefb42

Any fix for this? I've been running into the same problem recently.

Aug 23 '25 18:08 ZJ331456