dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Question about instance recognition metrics

Open strbcks opened this issue 1 year ago • 0 comments

Hi,

In Table 9: Evaluation of frozen features on instance-level recognition. of the table, it shows the performance for OpenCLIP-G/14 is 50.7 for Oxford-M and 19.7 for Oxford-H. However, we only get 39.4 for Oxford-M and 11.7 for Oxford-H (even without 1M distractors) using the evaluation code https://github.com/filipradenovic/revisitop/blob/master/python/evaluate.py#L39

Also tried revisit-oxford (without 1M distractors) the Dinov2-B14 distilled backbone with make_classification_eval_transform() transform in this repo, the metrics I get is 0.58 for Oxford-M and 0.337 for Oxford-H, which seems much lower than the number reported in the paper 0.729 for Oxford-M and 0.495 for Oxford-H.

If possible, could you help clarify:

  1. what metrics you are reporting in the paper, is it mean average precision or mean precision at kappas?
  2. Are you including the 1M distractors in the eval?
  3. what transform I should use the released backbone?

Similar for met, we also cannot reproduce the eval metrics for both OpenCLIP-G/14 and Dinov2-B14.

It will be great if you could provide the code to run on eval sets or the embedding generated!

Thanks!

strbcks avatar Apr 21 '23 16:04 strbcks