CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

the acc result on the CIFAR100 dataset

Open realTaki opened this issue 3 years ago • 15 comments

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper? Can you give some details to help me reproduce the acc result in the paper? or do you have any idea how to troubleshoot this?

realTaki avatar Sep 13 '21 05:09 realTaki

Hi, see https://github.com/openai/CLIP/blob/main/data/prompts.md#cifar100 where you can now find the class names and prompts for ensembling zero-shot predictions for CIFAR100.

jongwook avatar Sep 24 '21 03:09 jongwook

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper? Can you give some details to help me reproduce the acc result in the paper? or do you have any idea how to troubleshoot this?

Hello! Have you try to reproduce the results on voc2007? What I got is only 71% mAP with ViT-B/32 using official class names and prompts, which is way below the reported one: 83.1%. Do you have any suggestions? Really thank you for your help

weiyx16 avatar Oct 27 '21 04:10 weiyx16

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper? Can you give some details to help me reproduce the acc result in the paper? or do you have any idea how to troubleshoot this?

Hello! Have you try to reproduce the results on voc2007? What I got is only 71% mAP with ViT-B/32 using official class names and prompts, which is way below the reported one: 83.1%. Do you have any suggestions? Really thank you for your help

Have you checked the order of categories in voc and prompts? FYI, the default order in prompts: classes = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'dog', 'horse', 'motorbike', 'person', 'sheep', 'sofa', 'diningtable', 'pottedplant', 'train', 'tvmonitor', ]

In VOC, the order is: pascalvoc2007_classes = [ 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor', ]

Replace the order of prompts with the order in voc, you will get about 82.5 for VIT-B/32

xcpeng avatar Oct 27 '21 16:10 xcpeng

The issue is the index of 'dinningtable' in prompts by the way

xcpeng avatar Oct 27 '21 16:10 xcpeng

The issue is the index of 'dinningtable' in prompts by the way

Really thank you for your help!! Actually I have noticed the problem of the prompts and fixed in advance, but I still couldn't reproduce the results. And I need to fix that the point I reproduced is 75%, not 71% mentioned before. The key question is keeping confusing me is how to calculate the 11-point mAP.

Here is how I reproduce the results: https://github.com/weiyx16/vocreproduce/blob/main/reproduce.py Just run reproduce.py and it will download the model and dataset automatically.

weiyx16 avatar Oct 29 '21 14:10 weiyx16

The issue is the index of 'dinningtable' in prompts by the way

Really thank you for your help!! Actually I have noticed the problem of the prompts and fixed in advance, but I still couldn't reproduce the results. And I need to fix that the point I reproduced is 75%, not 71% mentioned before. The key question is keeping confusing me is how to calculate the 11-point mAP.

Here is how I reproduce the results: https://github.com/weiyx16/vocreproduce/blob/main/reproduce.py Just run reproduce.py and it will download the model and dataset automatically.

I think the I fix the bug by adding a softmax after logits. This operation will not effect the accuracy in others dataset, but effect the sorting in mAP calculation.

weiyx16 avatar Nov 02 '21 11:11 weiyx16

@weiyx16 Greetings. Have you ever tried CLIP on the StanfordCars dataset? I can only get a ~47%/~48% acc without/with the "prompt ensembling" trick, far away from ~55% as reported in the original paper. Could you help me with some possible clues, please?

machengcheng2016 avatar Jun 07 '22 08:06 machengcheng2016

@machengcheng2016 We tried it before, and it's able to reproduce the zero-shot performance on StanfordCars using R50 backbone (about 55.0% in our experiments). Nothing special needs to check. Have you verified the dataset? It has 8041 test images. And have you reproduced results on other datasets?

weiyx16 avatar Jun 10 '22 06:06 weiyx16

I solved my problem. It is data augmentation that cheats me.

machengcheng2016 avatar Jun 10 '22 06:06 machengcheng2016

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper?

I also test the acc of ViT/B-32 on CIFAR100 and get the result of 61.67%. I would like to ask if you have solved this problem?Can you give some details to reproduce the result?

QMiao-cs avatar Nov 26 '22 11:11 QMiao-cs

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper?

I also test the acc of ViT/B-32 on CIFAR100 and get the result of 61.67%. I would like to ask if you have solved this problem?Can you give some details to reproduce the result?

It might come from data augmentation. Please make sure you are using the correct one.

machengcheng2016 avatar Nov 26 '22 11:11 machengcheng2016

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper?

I also test the acc of ViT/B-32 on CIFAR100 and get the result of 61.67%. I would like to ask if you have solved this problem?Can you give some details to reproduce the result?

It might come from data augmentation. Please make sure you are using the correct one.

The images are loaded by the following code in the `README.md' : cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False) image, class_id = cifar100[3637] Is there any data augmentation?

QMiao-cs avatar Nov 26 '22 11:11 QMiao-cs

I use the code in the README.md (Zero-Shot Prediction) to test the acc of ViT/B-32 on the CIFAR100 dataset and get the result of about 62% by top1 similarity. It is less than that in the paper (65.1%), I was wondering that how it measures the acc of a zero-shot task in the paper?

I also test the acc of ViT/B-32 on CIFAR100 and get the result of 61.67%. I would like to ask if you have solved this problem?Can you give some details to reproduce the result?

It might come from data augmentation. Please make sure you are using the correct one.

Hi, I wonder what would be the correct data augmentation setting, I used standard transform settings on validation set of CIFAR100 plus resizing to 224

Calmepro777 avatar Dec 01 '22 16:12 Calmepro777

Please refer to https://github.com/openai/CLIP/blob/fcab8b6eb92af684e7ff0a904464be7b99b49b88/notebooks/Prompt_Engineering_for_ImageNet.ipynb for this concern.

shyammarjit avatar Sep 28 '23 01:09 shyammarjit

Could anyone tell me why "order in prompts" matter? Thanks in advance.

jiachengc avatar Apr 24 '24 21:04 jiachengc