ViTs-vs-CNNs icon indicating copy to clipboard operation
ViTs-vs-CNNs copied to clipboard

Adversarial training is not working

Open ksouvik52 opened this issue 2 years ago • 12 comments

Hi, as you suggested in the earlier thread, we tried with the exact same settings and argument that you provided to do train on deit tiny adversarially. However, it is still not working, only 18% accuracy after 52 epochs.

{"train_lr": 9.999999999999953e-07, "train_loss": 6.91152723201459, "test_0_loss": 6.872483930142354, "test_0_acc1": 0.26, "test_0_acc5": 1.12, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 0, "n_parameters": 5717416} {"train_lr": 9.999999999999953e-07, "train_loss": 6.900039605433039, "test_0_loss": 6.850005263940539, "test_0_acc1": 0.362, "test_0_acc5": 1.578, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 1, "n_parameters": 5717416} {"train_lr": 0.00040089999999999305, "train_loss": 6.754413659242894, "test_0_loss": 6.145018349987379, "test_0_acc1": 3.726, "test_0_acc5": 10.93, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 2, "n_parameters": 5717416} {"train_lr": 0.0008008000000000196, "train_loss": 6.625597798519379, "test_0_loss": 5.638628653967449, "test_0_acc1": 7.224, "test_0_acc5": 18.596, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 3, "n_parameters": 5717416} {"train_lr": 0.001200700000000036, "train_loss": 6.560625480233337, "test_0_loss": 5.315861636678607, "test_0_acc1": 10.064, "test_0_acc5": 23.916, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 4, "n_parameters": 5717416} {"train_lr": 0.0016005999999999657, "train_loss": 6.54946454628099, "test_0_loss": 5.134192888048774, "test_0_acc1": 11.984, "test_0_acc5": 27.634, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 5, "n_parameters": 5717416} {"train_lr": 0.0020004999999999914, "train_loss": 6.542691466810225, "test_0_loss": 5.129832990186304, "test_0_acc1": 12.648, "test_0_acc5": 28.482, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 6, "n_parameters": 5717416} {"train_lr": 0.002400399999999976, "train_loss": 6.536189370113406, "test_0_loss": 5.018598655821495, "test_0_acc1": 13.942, "test_0_acc5": 31.118, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 7, "n_parameters": 5717416} {"train_lr": 0.002800300000000059, "train_loss": 6.503189501430777, "test_0_loss": 4.9134670785048495, "test_0_acc1": 15.256, "test_0_acc5": 33.352, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 8, "n_parameters": 5717416} {"train_lr": 0.0032002000000000983, "train_loss": 6.5094980549373975, "test_0_loss": 4.851811613246408, "test_0_acc1": 15.578, "test_0_acc5": 34.102, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 9, "n_parameters": 5717416} {"train_lr": 0.0036000999999999585, "train_loss": 6.511975479259384, "test_0_loss": 5.143745462633598, "test_0_acc1": 13.898, "test_0_acc5": 30.834, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 10, "n_parameters": 5717416} {"train_lr": 0.0039023577500088853, "train_loss": 6.418486285981515, "test_0_loss": 5.218988336208, "test_0_acc1": 12.054, "test_0_acc5": 27.72, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 11, "n_parameters": 5717416} {"train_lr": 0.003882057134063648, "train_loss": 6.32594983450038, "test_0_loss": 5.0386335921455325, "test_0_acc1": 14.276, "test_0_acc5": 31.454, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 12, "n_parameters": 5717416} {"train_lr": 0.0038599040893470705, "train_loss": 6.353343641300567, "test_0_loss": 4.799374374913163, "test_0_acc1": 16.064, "test_0_acc5": 34.754, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 13, "n_parameters": 5717416} {"train_lr": 0.0038359204782395374, "train_loss": 6.200804713199274, "test_0_loss": 5.33436276099656, "test_0_acc1": 11.77, "test_0_acc5": 26.914, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 14, "n_parameters": 5717416} {"train_lr": 0.0038101299696696937, "train_loss": 6.372054211956134, "test_0_loss": 5.132588769103652, "test_0_acc1": 14.968, "test_0_acc5": 32.62, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 15, "n_parameters": 5717416} {"train_lr": 0.0037825580157558377, "train_loss": 6.417677939938699, "test_0_loss": 4.666486162294277, "test_0_acc1": 15.988, "test_0_acc5": 34.454, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 16, "n_parameters": 5717416} {"train_lr": 0.0037532318266873923, "train_loss": 6.26336412826221, "test_0_loss": 5.38978447795143, "test_0_acc1": 12.402, "test_0_acc5": 28.042, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 17, "n_parameters": 5717416} {"train_lr": 0.003722180343872929, "train_loss": 6.142464170686537, "test_0_loss": 4.77877281021782, "test_0_acc1": 15.232, "test_0_acc5": 33.184, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 18, "n_parameters": 5717416} {"train_lr": 0.0036894342113766073, "train_loss": 6.004803928301679, "test_0_loss": 5.506222593730944, "test_0_acc1": 10.49, "test_0_acc5": 24.774, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 19, "n_parameters": 5717416} {"train_lr": 0.0036550257456777735, "train_loss": 6.135914517725877, "test_0_loss": 5.406659782199775, "test_0_acc1": 11.604, "test_0_acc5": 27.232, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 20, "n_parameters": 5717416} {"train_lr": 0.0036189889037779527, "train_loss": 6.093383449254086, "test_0_loss": 5.010170250158621, "test_0_acc1": 13.934, "test_0_acc5": 30.648, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 21, "n_parameters": 5717416} {"train_lr": 0.0035813592496895356, "train_loss": 6.059879529152176, "test_0_loss": 5.371730933796497, "test_0_acc1": 10.762, "test_0_acc5": 25.292, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 22, "n_parameters": 5717416} {"train_lr": 0.00354217391933767, "train_loss": 6.073066240544323, "test_0_loss": 5.048453500617107, "test_0_acc1": 13.242, "test_0_acc5": 29.126, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 23, "n_parameters": 5717416} {"train_lr": 0.0035014715839127445, "train_loss": 6.08733743352951, "test_0_loss": 5.331494154414533, "test_0_acc1": 10.974, "test_0_acc5": 25.072, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 24, "n_parameters": 5717416} {"train_lr": 0.003459292411705762, "train_loss": 6.1378554502646505, "test_0_loss": 5.264302222605172, "test_0_acc1": 10.784, "test_0_acc5": 25.304, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 25, "n_parameters": 5717416} {"train_lr": 0.0034156780284671424, "train_loss": 6.1356256470786965, "test_0_loss": 5.568371077798074, "test_0_acc1": 8.102, "test_0_acc5": 19.622, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 26, "n_parameters": 5717416} {"train_lr": 0.003370671476327724, "train_loss": 6.071214107396029, "test_0_loss": 5.589009375886435, "test_0_acc1": 9.484, "test_0_acc5": 22.578, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 27, "n_parameters": 5717416} {"train_lr": 0.00332431717132075, "train_loss": 5.972238908568732, "test_0_loss": 5.707102177010388, "test_0_acc1": 7.978, "test_0_acc5": 19.746, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 28, "n_parameters": 5717416} {"train_lr": 0.0032766608595485736, "train_loss": 6.1025936421539955, "test_0_loss": 5.42314580428013, "test_0_acc1": 11.556, "test_0_acc5": 26.76, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 29, "n_parameters": 5717416} {"train_lr": 0.0032277495720376523, "train_loss": 6.084920492580087, "test_0_loss": 5.923481827123914, "test_0_acc1": 6.536, "test_0_acc5": 16.768, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 30, "n_parameters": 5717416} {"train_lr": 0.003177631578323426, "train_loss": 6.115835655793298, "test_0_loss": 5.460594815469596, "test_0_acc1": 9.702, "test_0_acc5": 23.214, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 31, "n_parameters": 5717416} {"train_lr": 0.003126356338814952, "train_loss": 6.0647800623846475, "test_0_loss": 5.592425890023786, "test_0_acc1": 10.268, "test_0_acc5": 23.954, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 32, "n_parameters": 5717416} {"train_lr": 0.0030739744559831173, "train_loss": 6.054409027099609, "test_0_loss": 5.690968008889499, "test_0_acc1": 9.494, "test_0_acc5": 22.702, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 33, "n_parameters": 5717416} {"train_lr": 0.003020537624422026, "train_loss": 6.063710031606597, "test_0_loss": 4.698054779361473, "test_0_acc1": 15.238, "test_0_acc5": 32.862, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 34, "n_parameters": 5717416} {"train_lr": 0.0029660985798329645, "train_loss": 5.910670252202703, "test_0_loss": 5.064415029280474, "test_0_acc1": 13.038, "test_0_acc5": 29.468, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 35, "n_parameters": 5717416} {"train_lr": 0.0029107110469803756, "train_loss": 5.995033393136794, "test_0_loss": 4.890588184388418, "test_0_acc1": 14.9, "test_0_acc5": 32.556, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 36, "n_parameters": 5717416} {"train_lr": 0.002854429686672256, "train_loss": 5.9739233070044975, "test_0_loss": 5.277839526410142, "test_0_acc1": 12.19, "test_0_acc5": 28.048, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 37, "n_parameters": 5717416} {"train_lr": 0.002797310041816381, "train_loss": 5.918417158744318, "test_0_loss": 5.211479980214925, "test_0_acc1": 13.658, "test_0_acc5": 30.232, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 38, "n_parameters": 5717416} {"train_lr": 0.002739408482605983, "train_loss": 5.925514445745116, "test_0_loss": 5.419682004859031, "test_0_acc1": 10.596, "test_0_acc5": 24.802, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 39, "n_parameters": 5717416} {"train_lr": 0.002680782150889308, "train_loss": 5.883729684505341, "test_0_loss": 5.13738645015431, "test_0_acc1": 13.256, "test_0_acc5": 29.476, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 40, "n_parameters": 5717416} {"train_lr": 0.0026214889037779947, "train_loss": 5.8768603530623835, "test_0_loss": 5.099718636911188, "test_0_acc1": 14.214, "test_0_acc5": 32.066, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 41, "n_parameters": 5717416} {"train_lr": 0.002561587256548306, "train_loss": 5.914836130887389, "test_0_loss": 5.2416254764783865, "test_0_acc1": 11.636, "test_0_acc5": 26.856, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 42, "n_parameters": 5717416} {"train_lr": 0.0025011363248938225, "train_loss": 5.900150206830386, "test_0_loss": 5.101568935318627, "test_0_acc1": 13.276, "test_0_acc5": 30.104, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 43, "n_parameters": 5717416} {"train_lr": 0.00244019576658606, "train_loss": 5.86289567250809, "test_0_loss": 4.907099856829994, "test_0_acc1": 14.74, "test_0_acc5": 32.012, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 44, "n_parameters": 5717416} {"train_lr": 0.0023788257225985108, "train_loss": 5.835502566836721, "test_0_loss": 5.569610283303093, "test_0_acc1": 11.684, "test_0_acc5": 26.968, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 45, "n_parameters": 5717416} {"train_lr": 0.002317086757755297, "train_loss": 5.902843760119544, "test_0_loss": 5.3058019126750535, "test_0_acc1": 13.534, "test_0_acc5": 29.732, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 46, "n_parameters": 5717416} {"train_lr": 0.0022550398009607707, "train_loss": 5.798361852205248, "test_0_loss": 5.197937856175087, "test_0_acc1": 13.114, "test_0_acc5": 29.348, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 47, "n_parameters": 5717416} {"train_lr": 0.0021927460850704548, "train_loss": 5.73072993350353, "test_0_loss": 4.734242562521595, "test_0_acc1": 17.252, "test_0_acc5": 35.896, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 48, "n_parameters": 5717416} {"train_lr": 0.0021302670864610006, "train_loss": 5.69394142336125, "test_0_loss": 5.058369449217657, "test_0_acc1": 14.906, "test_0_acc5": 32.252, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 49, "n_parameters": 5717416} {"train_lr": 0.002067664464360847, "train_loss": 5.709019121363294, "test_0_loss": 4.3592056012351925, "test_0_acc1": 19.068, "test_0_acc5": 39.056, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 50, "n_parameters": 5717416} {"train_lr": 0.002005000000000026, "train_loss": 5.700551097960972, "test_0_loss": 4.753117966484123, "test_0_acc1": 15.846, "test_0_acc5": 33.87, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 51, "n_parameters": 5717416} {"train_lr": 0.0019423355356391193, "train_loss": 5.7724813230031975, "test_0_loss": 4.528603377741876, "test_0_acc1": 18.206, "test_0_acc5": 37.726, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 52, "n_parameters": 5717416}

ksouvik52 avatar Jul 24 '22 16:07 ksouvik52

This is the command we are using:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=12349 --use_env main_adv_deit.py --model deit_tiny_patch16_224_adv --batch-size=128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_tiny_patch16_224

ksouvik52 avatar Jul 24 '22 16:07 ksouvik52

Just to be sure, you are using deit-tiny? Did the deit-small work on you side now after yesterday? (Want to diagnose the problem by step)

ytongbai avatar Jul 24 '22 16:07 ytongbai

Okay, I will try with diet_small...have never tried yet.

ksouvik52 avatar Jul 24 '22 16:07 ksouvik52

Oh, but in your yesterday's response(https://github.com/ytongbai/ViTs-vs-CNNs/issues/8#issue-1315283100), you mentioned you are using deit_small_patch16_224_adv

'python -m torch.distributed.launch --nproc_per_node=4 --master_port=5672 --use_env main_adv_deit.py --model deit_small_patch16_224_adv --batch-size 128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_small_patch16_224'

ytongbai avatar Jul 24 '22 16:07 ytongbai

Yes, I never tried with 8 gpu node for deit small. Trying now. So, the goal is to narrow down whether its model specific?

ksouvik52 avatar Jul 24 '22 16:07 ksouvik52

Oh I see..

ytongbai avatar Jul 24 '22 16:07 ytongbai

One reason to choose deit_tiny is to see the results quickly, as the deit_small takes little longer to train. unfortunately it is not working yet.

ksouvik52 avatar Jul 24 '22 16:07 ksouvik52

Just want to debug the problem, since we didn't use deit-tiny setting in the paper. So first step is to make sure the original recipe works on your side to rule out other problems.

But yes I do have some assumptions on DeiT-Tiny. As we mentioned in the paper, the augmentations are very strong, and with strong regularization like adversariall training, therefore even with the DeiT-small 100 epoch model, the regularization is very strong. So if you shrink down the model size, I would assume the model is over-regularized and cannot give good results, maybe using less augmentation would help. But if the model size goes up, like adversarail training on base model, that wouldn't be a problem.

(Previously in our experiment, DeiT-tiny would not need strong augmentation to perform well on adversarial training image)

Hope it helps :)

ytongbai avatar Jul 24 '22 17:07 ytongbai

Can you let me know what arguments you used to tone down augmentation on diet tiny to make it work?

ksouvik52 avatar Jul 24 '22 17:07 ksouvik52

Sure, we didn't fully explore under this setting since it is out of the scope(fair comparison between these two structures), but still got some initial recipe for you to explore.

With tiny model: With sgd, cosine lr scheduler, lr 0.1, no regularization and no augmentation, clean acc is around 54.9. pgd-5 is 31.7. (Adam gives similar results but less stable, so would recommand the above setting.

ytongbai avatar Jul 24 '22 17:07 ytongbai

Running on deit_small, will let you know the results...its around 2x slower than deit_tiny, so will be little late.

ksouvik52 avatar Jul 24 '22 17:07 ksouvik52

Sure no worries, I'll be around to help

ytongbai avatar Jul 24 '22 18:07 ytongbai