CLIP
CLIP copied to clipboard
Recall (R@k) way lower than the one obtained in papers
Here is what i do :
- Get test dataset of image - caption (IIITD-20K dataset)
- calculate embeddings with my fine tuned CLIP
- calculate cosine distance between each text to all images
- get the k closest images to the text, if the corresponding image is in it, do +1 to score
- get the recall by dividing the score by the length of the test dataset.
This is my recall at k. I obtain a R@1 of 17%, while most papers when finetuning CLIP obtain at least 60% recall at 1. Any idea what i could be doing wrong?
Any fix for this? I've been running into the same problem recently.