gnp Question about figure3 in the paper

Question about figure3 in the paper

Open JayC1208 opened this issue 6 months ago • 1 comments

Hi, I am walking through the experiments via codes, and find hard to understand the result of Figure 3 in "Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning".

In this figure, the gradient norm when $\alpha$ is 0.8 is above zero and seems bit large compared to other cases, while the testing error rate remains low. Your paper suggest that it is good to have smaller gradient norm as it indicates flat minima and this result is counter intuitive.

Can you give more explanation about this?

Aug 07 '24 05:08 JayC1208

gnp gnp copied to clipboard

Question about figure3 in the paper

gnp
gnp copied to clipboard