Hi, a question of inconsistency of the dymamics of tr(FIM) and tr(H).

Open wizard1203 opened this issue 4 years ago • 0 comments

Hi, thanks for your awesome work!

I noticed that the results in the paper: PYHESSIAN: Neural Networks Through the Lens of the Hessian, the tr(H) keeps increasing during training.

And in this paper: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries, the dominant eigenvalue of the Hessian w.r.t weights could decrease during small-batch training.

And in this paper: CRITICAL LEARNING PERIODS IN DEEP NETWORKS. The trace of FIM increases first, and decrease.

Are there some relationships between them? Are they inconsistent from others?

Dec 11 '21 02:12 wizard1203