PyHessian
PyHessian copied to clipboard
Hi, a question of inconsistency of the dymamics of tr(FIM) and tr(H).
Hi, thanks for your awesome work!
I noticed that the results in the paper: PYHESSIAN: Neural Networks Through the Lens of the Hessian, the tr(H) keeps increasing during training.

And in this paper: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries, the dominant eigenvalue of the Hessian w.r.t weights could decrease during small-batch training.

And in this paper: CRITICAL LEARNING PERIODS IN DEEP NETWORKS. The trace of FIM increases first, and decrease.

Are there some relationships between them? Are they inconsistent from others?