dinov2
dinov2 copied to clipboard
Question about instance recognition metrics
Hi,
In Table 9: Evaluation of frozen features on instance-level recognition. of the table, it shows the performance for OpenCLIP-G/14 is 50.7 for Oxford-M and 19.7 for Oxford-H. However, we only get 39.4 for Oxford-M and 11.7 for Oxford-H (even without 1M distractors) using the evaluation code https://github.com/filipradenovic/revisitop/blob/master/python/evaluate.py#L39
Also tried revisit-oxford (without 1M distractors) the Dinov2-B14 distilled backbone with make_classification_eval_transform()
transform in this repo, the metrics I get is 0.58 for Oxford-M and 0.337 for Oxford-H, which seems much lower than the number reported in the paper 0.729 for Oxford-M and 0.495 for Oxford-H.
If possible, could you help clarify:
- what metrics you are reporting in the paper, is it mean average precision or mean precision at kappas?
- Are you including the 1M distractors in the eval?
- what transform I should use the released backbone?
Similar for met, we also cannot reproduce the eval metrics for both OpenCLIP-G/14 and Dinov2-B14.
It will be great if you could provide the code to run on eval sets or the embedding generated!
Thanks!