CoOp
CoOp copied to clipboard
Cannot reproduce the results of CoOp and CoCoOp
Hi, thanks for the great work, but I found that it is hard to reproduce the results in the paper.
For example, using the released checkpoints in https://github.com/KaiyangZhou/CoOp#models-and-results, the results of vit-b32-ep50 (nctx=16, shots=16, ctp=end, csc=False) on ImageNet are:
transform | seed1 | seed2 | seed3 | |
---|---|---|---|---|
paper | - | 66.85 | - | - |
released checkpoint (inference only) | ["random_resized_crop", "random_flip", "normalize"] | 64.38 | 64.72 | 64.72 |
released checkpoint (inference only) | ["random_flip", "random_translation", "center_crop", "normalize"] | 65.11 | 65.32 | 65.34 |
our reproduce (training from scratch then inference) | ["random_resized_crop", "random_flip", "normalize"] | 65.21 | - | - |
they are all much lower (64.3~65.3) than the results in the paper (66.85), and using the updated transform in https://github.com/KaiyangZhou/CoOp/issues/8#issue-1021634747 for the released checkpoint achieves even worse performance.
For CoCoOp, the result of vit-b16-ep10 (nctx=4, shots=16, ctp=end) on ImageNet is 71.02, but our reproduce (training from scratch then inference) is 70.14, which is also underperformed.
Our environment informance: V100-32G / Titan RTX dassl=0.4.2 torch=1.7.1+cu110 torchvision=0.8.2+cu110
I wonder if I miss something? Thanks a lot.