FeatureAttack
FeatureAttack copied to clipboard
Strongest attack against Feature Scatter and Adversarial Interpolation
Feature Attack
*Important
90% codes copy from FeatureScatter and Madry PGD adv. training
My work
Created Feature Attack, and it's stronger than PGD attack or CW attack w.r.t Feature Scatter and Adversarial Interpolation Training.
Reference Model
Model trained on CIFAR10: FS and Adv_inter
Evaluate
Feature Scatter
sh fs_eval_feature_attack.sh
Madry's
cd cifar10_challenge && python feature_attack_batch_tf.py
Result
Defense | clean | FGSM | PGD20-2-8 | CW20-2-8 | FeatureAttack20-1-8-100(100 target images) | adv_test_images |
---|---|---|---|---|---|---|
Feature Scatter | 90.3 | 78.4 | 71.1 | 62.4 | 36.94 | |
Adv_inter | 90.5 | 78.1 | 74.4 | 69.5 | 37.64 | |
Madry | 87.25 | 45.87 | 46.37 | |||
Sensible adversarial learning | 91.51 | 74.32 | 62.04 | 59.91 | 43.76 | sensible_adv_x |
bilateral_AT (mosa_eps4) | 92.8 | 71.0(pgd100-2-8) | 67.9(CW100-2-8) | 32.28 | bilater_adv_x | |
GCE | 62.74 | 9.55(MNIST_PGD40-0.01-0.2) | 0 | |||
TRADES | 84.92 | 55.4 | 53.89 | 52.94(50-1-8-200) | TRADES_adv_x_float_0~1_npy |
Introduction of adversarial test images
For CIFAR10 test data set
eps = 8./255.
nat_X = ALL_CLEAN_TEST_IMAGES # default order in PyTorch [0, 1]
adv_X_uint8 = torch.load('ADV_TEST_IMAGES_PATH')
adv_X = adv_X_uint8.type(torch.FloatTensor) / 255. # [0, 1]
assert adv_X.min() >= 0. and adv_X.max() <= 1.
abs_diff = torch.abs(adv_X - nat_X)
assert abs_diff <= eps + 0.0001