large_vlm_distillation_ood icon indicating copy to clipboard operation
large_vlm_distillation_ood copied to clipboard

ViT Performance

Open ruizhaoz opened this issue 10 months ago • 1 comments

Hello, I tried out the code, and it is very neat and it works fine. One question I noticed is the ViT-b performance (Table 9) is much worse than the ResNet18 performance. Any idea why is it? Thanks!

ruizhaoz avatar Apr 05 '24 15:04 ruizhaoz

ViT is known to suffer significant overfitting under a small training dataset even under strong augmentations, as it does not utilize priors such as translation equivariance in CNN. For it to match similar performance and beat CNNs, we need a dataset at the scale of at least something similar to tiered-imagenet or imagenet.

xuanlinli17 avatar Apr 05 '24 17:04 xuanlinli17