VL-CheckList
VL-CheckList copied to clipboard
Reproducing CLIP score in the paper
Hi,
Thanks for opening the source code.
I'm trying to reproduce the scores for CLIP in the paper but fail to reproduce it.
I use the sample config file by changing MODE_NAME
to CLIP (ViT-L/14).
I evaluate all the datasets in the corpus then average the final accuracy.
I got the following score which is quite different from the paper,
Object: 0.8205209550766983
Attribute: 0.6806109948697314
Relation: 0.67975
How can I reproduce the scores in the paper?
Hi, @kkjh0723
Did you have to make any changes to the code in order to get it working? I am also trying to replicate the CLIP result but unable to do so.
Thanks!
@ayushchakravarthy , If I remember correctly, there were some minor changes required to run CLIP.
In the following lines,
I changed result_tmp[i][0][1]
to result_tmp[i][0][0]
and result_tmp[i][1][1]
to result_tmp[i][1][0]
.
Also, in this lines, I changed it as following,
sample_t = random.sample(sample_true,self.sample_num if len(sample_true)>self.sample_num else len(sample_true))
sample_f = random.sample(sample_false,self.sample_num if len(sample_false)>self.sample_num else len(sample_false))
Hi,@kkjh0723 Have you reproduced the results of this work? I have tried many times, but the end result is not satisfactory. I used the CLIP(ViT-B/32) as my model. And I select the "ITM" task to test. For the final average scores, Attribute : 68.6477405706409 Relation : 74.7221415628598 Object : 89.4515112110188 The result is much higher than the paper. So I'd like to know how much data you used, since your results don't vary that much. Thank you!