adversarial-robustness-toolbox copied to clipboard
Some question about computing the adversarial saliency map in JSMA attack
When using JSMA method, I found that the implementation of adversarial saliency map of this toolbox is slightly different from the original paper:
In this toolbox, corresponding implementation in looks like this:
def _saliency_map(self, x: np.ndarray, target: Union[np.ndarray, int], search_space: np.ndarray) -> np.ndarray:
grads = self.estimator.class_gradient(x, label=target)
grads = np.reshape(grads, (-1, self._nb_features))
# Remove gradients for already used features
used_features = 1 - search_space
coeff = 2 * int(self.theta > 0) - 1
grads[used_features == 1] = -np.inf * coeff
if self.theta > 0:
ind = np.argpartition(grads, -2, axis=1)[:, -2:]**
else: # pragma: no cover
ind = np.argpartition(-grads, -2, axis=1)[:, -2:]
return ind
I notice that ind is selected directly from grads
But in original paper, adversarial saliency map is computed like this :
or heuristic equation like this:
I'm confused about this difference.
Hi @HIT1180300227 I think this implementation of JSMA is neglecting the additional terms on gradients towards classes other than the target class. Have you been able to use the attack successfully?
Hi @beat-buesser ,
I use JSMA method in ids(intrusion detection system) field.Specifically, I use the targeted JSMA method on the statistical feature vectors as follows:
art_classifier = KerasClassifier(model=model, use_logits=False)
attack = SaliencyMapMethod(classifier=art_classifier, theta=theta, gamma=gamma, batch_size=1,verbose=True)
#x_test are original statistical feature vectors
targeted_x_test_jsma = attack.generate(x=x_test,y=numpy_targets)
Before using jsma attack,I can get 90% classification accuracy.After using this attack method, the classification accuracy will be reduced to 20%.
It seems that although the implementation of this attack method is not consistent with the original paper, it can still successfully confuse the classification model.
Why does the jsma attack still work?
Hi @HIT1180300227 I think it still works because the main component of the gradients is the same, e.g. the direction in which the current classes' logit value decreases. The paper is more accurate by requiring additional terms for updates to this direction to make sure the other logins are not increasing. It looks that for many applications these additional therms might be small/negligible, but it would be more complicated to implement.