pytorch-nips2017-attack-example
pytorch-nips2017-attack-example copied to clipboard
L2 distance between adversarial example and the original input data
In attacks.AttackCarliniWagnerL2._optimize there's:
if input_orig is None:
dist = l2_dist(input_adv, input_var, keepdim=False)
The problem is that input_var has already been mapped to tanh-space, so it's in fact not the original input. However, the adversarial example input_adv is the one to be used. Therefore, without mapping input_var back to its original space, the dist calculated won't be the true L2 distance between the adversarial example and the original image. In Carlini's code he performed the exact operation. Thanks for checking!