SDAT icon indicating copy to clipboard operation
SDAT copied to clipboard

Different behaviours on VisDA-2017 using different pretrained models from timm

Open swift-n-brutal opened this issue 2 years ago • 5 comments

Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.

  1. It seems that the ViT backbone doesn't match with the bottleneck when setting no_pool. The output of ViT backbone is a sequence of tokens instead of a single class token. Thus, it makes the BatchNorm1d layer complains about the dimension.
  2. I fix the previous problem by adding a pool layer to extract the class token: pool_layer = lambda _x: _x[:, 0] if args.no_pool else None Then use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT: python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_results Finally I get a slightly lower accuracy as below: global correct: 86.0 mean correct:88.3 mean IoU: 78.5 +------------+-------------------+--------------------+ | class | acc | iou | +------------+-------------------+--------------------+ | aeroplane | 97.83323669433594 | 96.3012924194336 | | bicycle | 88.43165588378906 | 81.25331115722656 | | bus | 81.79104614257812 | 72.69281768798828 | | car | 78.06941986083984 | 67.53160095214844 | | horse | 97.31400299072266 | 92.78455352783203 | | knife | 96.91566467285156 | 82.31681823730469 | | motorcycle | 94.9102783203125 | 83.37374877929688 | | person | 81.3499984741211 | 58.12790298461914 | | plant | 94.04264831542969 | 89.68553161621094 | | skateboard | 95.87899780273438 | 81.48286437988281 | | train | 94.05099487304688 | 87.69535064697266 | | truck | 59.04830551147461 | 48.311458587646484 | +------------+-------------------+--------------------+ test_acc1 = 86.0 I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.

swift-n-brutal avatar Sep 15 '22 03:09 swift-n-brutal

After some research, the first problem has been resolved. The default behaviour of ViT.forward() changed in different version of timm. When global_pool='', the backbone returns x[:, 0] in timm=v0.5.x, while it returns x in timm=v0.6.7.

swift-n-brutal avatar Sep 16 '22 07:09 swift-n-brutal

Hi @swift-n-brutal are you now able to get the correct accuracy for VisDA dataset?

rangwani-harsh avatar Sep 24 '22 05:09 rangwani-harsh

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one. val top1 acc curves

swift-n-brutal avatar Sep 27 '22 07:09 swift-n-brutal

I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one. val top1 acc curves

Have you found the reason? Is it a flaw in the model or a problem with our operation?

Wangzs0228 avatar Oct 18 '23 13:10 Wangzs0228

@Wangzs0228 It is almost sure that the smoothness regularization is beneficial to transferability, robustness, generalization ability, etc. For a specific task, the results may vary. I am not working on this task recently. You can try the experiments to see if the results match your expectations.

swift-n-brutal avatar Oct 20 '23 06:10 swift-n-brutal