gnp
gnp copied to clipboard
Question about figure3 in the paper
Hi, I am walking through the experiments via codes, and find hard to understand the result of Figure 3 in "Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning".
In this figure, the gradient norm when $\alpha$ is 0.8 is above zero and seems bit large compared to other cases, while the testing error rate remains low. Your paper suggest that it is good to have smaller gradient norm as it indicates flat minima and this result is counter intuitive.
Can you give more explanation about this?