recognize-anything How do you get thresholds for clip model in results of Table 3 ?

How do you get thresholds for clip model in results of Table 3 ?

Open fmy7834 opened this issue 1 year ago • 4 comments

Hi, I want to know how do you get thresholds for clip model in results of Table 3 ? Is it the same way like you said in another issue?

Aug 23 '23 03:08 fmy7834

Similar to the zero-shot inference of CLIP on ImageNet, we directly use "cross modal feature similarity + threshold" for image tagging testing. It is worth noting that this approach is very sensitive to the selection of threshold and difficult to apply in practice.

Aug 23 '23 07:08 xinyu1205

But I see the results of CLIP on Multi-label Classification datasets in Tabel 3 are competitive. Could you tell me how you determine the thresholds in detail?

Aug 23 '23 10:08 fmy7834

Just manually adjusting the threshold to achieve the best performance for CLIP. For fair comparison, each model in Table 3 of RAM paper uses a unified threshold for each category.

Aug 23 '23 11:08 xinyu1205

Got it. Thank you very much!

Aug 23 '23 11:08 fmy7834

recognize-anything recognize-anything copied to clipboard

How do you get thresholds for clip model in results of Table 3 ?

recognize-anything
recognize-anything copied to clipboard