VL-CheckList Reproducing CLIP score in the paper

Reproducing CLIP score in the paper

Open kkjh0723 opened this issue 1 year ago • 3 comments

Hi,

Thanks for opening the source code. I'm trying to reproduce the scores for CLIP in the paper but fail to reproduce it. I use the sample config file by changing MODE_NAME to CLIP (ViT-L/14). I evaluate all the datasets in the corpus then average the final accuracy. I got the following score which is quite different from the paper,

Object: 0.8205209550766983
Attribute: 0.6806109948697314
Relation: 0.67975

How can I reproduce the scores in the paper?

Jun 11 '23 23:06 kkjh0723

Hi, @kkjh0723

Did you have to make any changes to the code in order to get it working? I am also trying to replicate the CLIP result but unable to do so.

Thanks!

Jun 13 '23 02:06 ayushchakravarthy

@ayushchakravarthy , If I remember correctly, there were some minor changes required to run CLIP.

In the following lines, I changed result_tmp[i][0][1] to result_tmp[i][0][0] and result_tmp[i][1][1] to result_tmp[i][1][0].

Also, in this lines, I changed it as following,

sample_t = random.sample(sample_true,self.sample_num if len(sample_true)>self.sample_num else len(sample_true))
sample_f = random.sample(sample_false,self.sample_num if len(sample_false)>self.sample_num else len(sample_false))

Jun 13 '23 08:06 kkjh0723

Hi，@kkjh0723 Have you reproduced the results of this work? I have tried many times, but the end result is not satisfactory. I used the CLIP(ViT-B/32) as my model. And I select the "ITM" task to test. For the final average scores, Attribute : 68.6477405706409 Relation : 74.7221415628598 Object : 89.4515112110188 The result is much higher than the paper. So I'd like to know how much data you used, since your results don't vary that much. Thank you!

Oct 24 '23 02:10 feilvvl

VL-CheckList VL-CheckList copied to clipboard

Reproducing CLIP score in the paper

VL-CheckList
VL-CheckList copied to clipboard