recognize-anything icon indicating copy to clipboard operation
recognize-anything copied to clipboard

How do you get thresholds for clip model in results of Table 3 ?

Open fmy7834 opened this issue 1 year ago • 4 comments

Hi, I want to know how do you get thresholds for clip model in results of Table 3 ? Is it the same way like you said in another issue? image

fmy7834 avatar Aug 23 '23 03:08 fmy7834

Similar to the zero-shot inference of CLIP on ImageNet, we directly use "cross modal feature similarity + threshold" for image tagging testing. It is worth noting that this approach is very sensitive to the selection of threshold and difficult to apply in practice.

xinyu1205 avatar Aug 23 '23 07:08 xinyu1205

But I see the results of CLIP on Multi-label Classification datasets in Tabel 3 are competitive. Could you tell me how you determine the thresholds in detail?

fmy7834 avatar Aug 23 '23 10:08 fmy7834

Just manually adjusting the threshold to achieve the best performance for CLIP. For fair comparison, each model in Table 3 of RAM paper uses a unified threshold for each category.

xinyu1205 avatar Aug 23 '23 11:08 xinyu1205

Got it. Thank you very much!

fmy7834 avatar Aug 23 '23 11:08 fmy7834