SDAT
SDAT copied to clipboard
Different behaviours on VisDA-2017 using different pretrained models from timm
Thanks for the great work. I meet two problems when conducting the experiment using ViT on VisDA-2017.
- It seems that the ViT backbone doesn't match with the bottleneck when setting no_pool. The output of ViT backbone is a sequence of tokens instead of a single class token. Thus, it makes the BatchNorm1d layer complains about the dimension.
- I fix the previous problem by adding a pool layer to extract the class token:
pool_layer = lambda _x: _x[:, 0] if args.no_pool else None
Then use the exact command in examples/run_visda.sh to run CDAN_MCC_SDAT:python cdan_mcc_sdat.py data/visda-2017 -d VisDA2017 -s Synthetic -t Real -a vit_base_patch16_224 --epochs 15 --seed 0 --lr 0.002 --per-class-eval --train-resizing cen.crop --log logs/cdan_mcc_sdat_vit/VisDA2017 --log_name visda_cdan_mcc_sdat_vit --gpu 0 --no-pool --rho 0.02 --log_results
Finally I get a slightly lower accuracy as below:global correct: 86.0
mean correct:88.3
mean IoU: 78.5
+------------+-------------------+--------------------+
| class | acc | iou |
+------------+-------------------+--------------------+
| aeroplane | 97.83323669433594 | 96.3012924194336 |
| bicycle | 88.43165588378906 | 81.25331115722656 |
| bus | 81.79104614257812 | 72.69281768798828 |
| car | 78.06941986083984 | 67.53160095214844 |
| horse | 97.31400299072266 | 92.78455352783203 |
| knife | 96.91566467285156 | 82.31681823730469 |
| motorcycle | 94.9102783203125 | 83.37374877929688 |
| person | 81.3499984741211 | 58.12790298461914 |
| plant | 94.04264831542969 | 89.68553161621094 |
| skateboard | 95.87899780273438 | 81.48286437988281 |
| train | 94.05099487304688 | 87.69535064697266 |
| truck | 59.04830551147461 | 48.311458587646484 |
+------------+-------------------+--------------------+
test_acc1 = 86.0
I notice that the epochs is 15 in the scripts. Is the experiment setting correct? How to get the reported accuracy? Many thank.
After some research, the first problem has been resolved. The default behaviour of ViT.forward() changed in different version of timm. When global_pool=''
, the backbone returns x[:, 0]
in timm=v0.5.x, while it returns x
in timm=v0.6.7.
Hi @swift-n-brutal are you now able to get the correct accuracy for VisDA dataset?
I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one.
I can get a close result (89.8%) of CDAN+MCC+SDAT on VisDA, but met a strange behaviour. As shown in the image below, the validation accuracy (not the mAP) keeps going down as the training proceeds, and the best result (mAP 89.9%) is achieved only at the first epoch. Then I wondered whether the pretrained model was problematic. I tested two models: vit_g ('https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz') from timm=0.5.x and vit_jx ('https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth') from timm=0.4.9. For vit_g the acc goes down, while for vit_jx the acc increases but the final mAP is (88.6%) much lower than the former one.
Have you found the reason? Is it a flaw in the model or a problem with our operation?
@Wangzs0228 It is almost sure that the smoothness regularization is beneficial to transferability, robustness, generalization ability, etc. For a specific task, the results may vary. I am not working on this task recently. You can try the experiments to see if the results match your expectations.