CoOp
CoOp copied to clipboard
the performance about full fine-tuning on ResNet.
Hi, thanks for the nice code. I found the performance is poor when full fine-tuning the ResNet-based CLIP on ImageNet while for ViT-based CLIP the performance is good. Do you have some insightful comments on why full fine-tuning or linear probing the ResNet-based CLIP makes the performance worse?