Review Report 2 - Anonymous Reviewer B
The following peer review was solicited as part of the Distill review process.
The reviewer chose to keep keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.
Distill is grateful to the reviewer for taking the time to give such a thorough review of this article. Thoughtful and invested reviewers are essential to the success of the Distill project.
Conflicts of Interest: Reviewer disclosed no conflicts of interest.
The present submission proposes an explanation for adversarial examples based on insufficient regularization of the model. It is argued that a lack of regularization leads to a decision boundary which is skewed and hence vulnerable to adversarial examples.
In my opinion, this article suffers from a few issues. The three main issues are:
- It makes claims (at least suggestively) about adversarial examples that I think are wrong.
- Its coverage of the related work is not very good.
- It uses language (such as "A new angle" and "We challenge this intuition") that seems to claim substantially more novelty than there actually is. I don't mind an expository paper that is not novel, but I do mind over-claiming.
In addition, I felt the writing could be improved, but I assume that other reviewers can comment on that in more depth.
Perhaps the most serious issue is issue 1. At a high level, I am not convinced that L2 regularization / a tilted decision boundary are the primary reason for the existence of adversarial examples. While this is a natural explanation to consider, it does not match my own empirical experience; furthermore, my sense is that this explanation has occurred to others in the field as well but has been de-emphasized due to not accounting for all the facts on the ground. (I wish that I had good citations covering this, but I do not know if/where this has been discussed in detail---however, there are various empirical papers showing that L2 regularization/weight decay does not work very well in comparison to adversarial training and other techniques).
More concretely, three claims that I think are wrong are the following (a brief explanation of why I think it is wrong is given after each point):
- that adversarial examples are primarily due to tilting of the decision boundary --- in high dimensions, every decision boundary (tilted or not) might have trouble avoiding adversarial examples
- that weight decay for deep networks confers robustness to adversarial examples --- weight decay seems to be too crude an instrument, and conveys only limited robustness
- that "the fact that [neural networks] are often vulnerable to linear attacks of small magnitude suggests that they are strongly under-regularized" --- many authors have found that neural networks actually underfit the data during adversarial training and need additional capacity to successfully fit adversarial perturbations