Activation logging

Open saurabh111233212 opened this issue 2 years ago • 0 comments

Logging for activations for all modules.

Updates (@epwalsh):

For each module, we log the activation L2 norm, average, absolute min, and absolute max. The are reduced over all ranks. Note that the way I have it implemented, the L2 norm is reduced by averaging over ranks. I thought that made the most sense because otherwise the scale of the metric depends on the world size.

For the small test model I ran there is only a small hit to throughput. For larger models we can increase the logging interval if it slows training down too much.

Oct 12 '23 22:10 saurabh111233212