cleverhans JSMA untargeted selects target at random

The JSMA untargeted attack selects the target class at random, a better result can be obtained by modifying the minimization objective to make the original class more-wrong instead of a random class less-wrong.

The FGM, BIM, and CW attacks have untargeted behave in this way, and it would be more clear if JSMA behaved this way too.

Aug 14 '17 22:08 carlini

This would be a change in the attack's behavior, which would be more confusing to people who are just looking for an implementation of the paper in my opinion. It might be less confusing to make it a variant of the attack than modifying the original behavior I guess.

Aug 15 '17 03:08 npapernot

I would vote for calling this something different then. Or putting a docstring warning maybe.

Aug 17 '17 23:08 carlini

Just wondering: do you already have code for this or is this more like a feature request?

Sep 20 '17 23:09 npapernot

I have code that does this sitting somewhere, yeah.

Sep 21 '17 01:09 carlini

I agree that some kind of warning would be helpful here, I was expecting the untargeted JSMA attack to make the original class more wrong. I'd be willing to submit a PR for this once symbolic JSMA is merged.

Sep 25 '17 19:09 AngusG

It looks like we should standardize the expected behavior across all attacks to have consistent behavior in the simple misclassification (untargeted) mode as opposed to the source-target misclassification mode (untargeted). Given that the JSMA is a source-target attack by design, this will require some thinking to adopt a default behavior that is most natural.

Sep 25 '17 20:09 npapernot

So the attack I have is targeted by design, and the untargeted attack is just a corresponding flip-the-objective type of thing. I think we can apply that to JSMA too. This is what I did some time earlier last year when I wanted JSMA to be untargeted. It would definitely be best to try out a few things and make sure that we pick the right choice though.

Let me simplify notation for github since it can't stylize math and assume JSMA to just operate over one pixel p instead of paris pq. In JSMA, for image x and target t let's define

a_p = (d/dx_p) Z(x)t b_p = sum{j != t} (d/dx_p) Z(x)_j

So that to perform JSMA, we optimize for argmax_p (-a_p*b_p) * (a_p > 0) * (b_p < 0)

Where the terms here correspond to:

a_p > 0 : make the target class more likely
b_p < 0: make the best other class less likely
(-a_p*b_p) : break ties by making the change biggest

To do this in an untargeted manner, we just have to set t to be the original class and flip a few signs, and instead optimize for argmax_p (a_p*b_p) * (a_p < 0) * (b_p > 0)

So that now we have

a_p > 0 : make the target (original) class LESS likely
b_p < 0: make the best other class MORE likely
(-a_p*b_p) : break ties by making the change biggest

I think this worked when I did it.

Sep 25 '17 21:09 carlini

I'm a bit confused by the use of "more-wrong" and "less-wrong" in this thread. Do you mean "less probable" and "more probable" respectively?

Nov 10 '17 22:11 goodfeli

@npapernot What's the status of this issue? It has been open for a long time without activity. Do we need to more actively find someone to take the baton on this? Maybe prioritize it at the next work session?

Feb 16 '18 02:02 catherio

I don't think anyone is working on it but I think it's OK to leave this issue open unless we would like to keep ideas for enhancement somewhere else? For instance in a MD file at the root of the repo or a wiki page so we don't clutter the issue tracker?

Feb 16 '18 13:02 npapernot

Seems fine, just going to mark it as "help wanted" for "nobody has the baton"

Feb 16 '18 18:02 catherio

Thank you Catherine!

Feb 17 '18 01:02 npapernot

@npapernot Can I work on this one? I think Carlini's idea will work just fine.

Jul 13 '18 18:07 aashish-kumar

I'm not sure we ever settled on what the best option is here. It would probably call for a bit of research to compare the different ways to make the attack untargeted. Maybe it would make sense to build in a consistent behavior across the different targeted attacks that allows the user to select different strategies when no specific target labels are specified like choose the target label to be (a) runner up according to softmax probabilities, (b) label with smallest softmax probabilities, (c) random target label

Jul 16 '18 07:07 npapernot

@npapernot I think all the three options you have mentioned are heuristic based targetted attacks. These will note be pure untargetted like how FGSM method works. Based on the Carlini's idea, I have created JSMA untargetted. It does not require a target label for generating attack. I have compared it with perturbations required in targetted format. Here is the graph.

jsma_untar Here the min and Average Pertub are for JSMA targetted attack. These are collected by running JSMA for all targets and then selecting the minimum(excluding True Class) and averaged.

The untargetted JSMA is better compared to averaged targetted JSMA and works in purely untargetted form. I had to do very small changes to your original JSMA. I have kept it as a pull request #458 for deliberation on the right way to integrate it.

Jul 16 '18 09:07 aashish-kumar

Thank you Aashish for making a PR and running the experiment above, I appreciate the contribution. There are a few issues with PR #458:

we should keep the API consistent with other attacks. The SaliencyMapMethod class has a parameter called y_target that can be used to trigger the untargeted attack when it is set to None.
a lot of the code for jsma_symbolic_untargetted in PR #458 is duplicated from jsma_symbolic. We should avoid duplicating code and instead try to merge both functions.
as mentioned previously (see #451), we should not create a new tutorial for the untargeted JSMA (file mnist_tutorial_untar_jsma.py) and instead modify the existing JSMA tutorial to include the untargeted behavior.

Given the amount of code that would need to be changed to address these comments, it will be easier to start the PR from scratch so I think it is easier to close this PR for now and keep all of the conversation here in one single place (if needed for future reference). This way we can discuss first how we'd like to implement this change before we work on a new PR.

Back to the conversation we had above, there are four strategies that surfaced in the discussion to address the case where no specific target label is provided by the user to the attack class:

minimize the probability of assigning the correct class
maximize the probability of assigning the runner up class (the class that is originally assigned the second largest softmax probability)
maximize the probability of assigning the least likely class (the class that is originally assigned the smallest softmax probability)
maximize the probability of assigning a randomly chosen class (a class is selected at random for each test point attacked)

(Definitions of 2-4 assume that the attack is only fed correctly classified test inputs, which is a reasonable thing to do and assume in my opinion.)

Currently, the FGM, BIM, and CW attacks are following strategy 1. when no target is provided and y_target is set to None. This the strategy that Nicholas suggested a variant of the JSMA for above and that you implemented in #458. I think it makes sense to include a strategy like that to be the default behavior (when y_target is set to None) modulo some minor modifications (see below for comments on the specific strategy).

I think it would also make sense to include the strategies 2-4 in all attacks. One way to do this would be to accept as a value for argument y_target the following three strings: runnerup (for strategy 2.), leastlikely (for strategy 3.) and rand (for strategy 4.). This would help with benchmarking the attacks in the targeted mode without having to specify specific targets manually for each test input.

Coming back to the implementation of strategy 1. for the JSMA, I agree that optimizing for argmax_p (a_p*b_p) * (a_p < 0) * (b_p > 0) is consistent with the spirit of the original targeted JSMA attack. One potential limitation is that without a specific target class, the attack might oscillate between different classes that differ from the original class before one of them is assigned a larger logit than the correct class. For this reason, I'd be interested in comparing the success rate and mean perturbation of strategies 1. and 2. on the full MNIST & CIFAR10 test sets before we set strategy to be the default untargeted strategy, i.e. the strategy followed when y_target=None in SaliencyMapMethod. This will help clarify whether that's an issue we should be concerned about or not at all.

Jul 18 '18 11:07 npapernot

2-4 are all targeted attacks with different ways of defining the target, right? Should we just leave the attacks taking a y_target argument and make easy ways to calculate the target? For example: 2. Pass y_target = runner_up(x, model) 3. Pass y_target = least_likely(x, model) 4. Pass y_target = random_class(nb_classes)

This way we don't need to have all the attacks implement all the target selection strategies, parse a lot of if statements to decode string names of target selection strategies, etc.

Jul 19 '18 00:07 goodfeli

That a good suggestion yes, we could make three functions runner_up, least_likely and random_class that take in either a single input or a batch of inputs as well as a model and return the target classes encoded as a hot one vector or labels.

Jul 19 '18 07:07 npapernot

@npapernot I agree with the suggestion. Continuing on the discussion. I collected Data for JSMA attack on MNIST test set. Here are the results from the experiment. On this graph, I have sorted the prediction probabilities & perturbations from the targeted attacks and identified which class the minimum perturbation belongs to. I have ignored the true class.(model prediction accuracy was 99.33). chart1 It was no surprise that the 1st runner_up class scored the highest but beyond that there was good spread in other classes. This means choosing 1st runner_up class as the target class will result us in the least perturbed adversarial images around ~30% of time. Also as part of the experiment I evaluate the average perturbation for runner_up classes and compared them to the un-targeted JSMA perturbations. Here are the result. chart2 As we can see un-targetted JSMA performs quite good on an average. I agree that there is possibility of oscillation in un-targetted attacks but the correlation between runner_up predicted class and perturbations is not high enough, resulting in inferior attacks on an average. These results are for JSMA attacks only. Since it's an iterative attack with quite small updates, makes it good candidate to analyze perturbations.

Jul 20 '18 15:07 aashish-kumar

Thanks for the detailed analysis. This is sufficient evidence to start implementing strategy 1 as described above as the untargeted JSMA strategy. Could you put a PR together while addressing the comments left above?

Also, if you have enough time, it would be nice to implement in a separate PR the 3 functions for strategies 2-4 as Ian suggested in his comment. If you don't have time, I'll flag it as a feature improvement waiting for a contributor in the issue tracker.

I will be offline until August 10 at which point I will take a look at the PR(s) unless someone else already approved the changes before I get back to it.

Jul 20 '18 21:07 npapernot

@npapernot If no one is currently working on this, can I take it up? I need to decrease the probability of assigning the correct class, and then implement the three alternative strategies (2-4), right?

Nov 25 '18 15:11 vikramnitin9

@vikramnitin9 @npapernot I was busy with some release.I will be submitting the pull request for the feature in a week's time.

Nov 25 '18 16:11 aashish-kumar

Okay

Nov 25 '18 17:11 vikramnitin9

cleverhans cleverhans copied to clipboard

JSMA untargeted selects target at random

cleverhans
cleverhans copied to clipboard