llama-recipes
llama-recipes copied to clipboard
Analysis of loss spikes in LLaMA pretrain
Dear LLaMA Teams,
A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1 of LLaMA [1] and in Figure 5 of LLaMA2 [2]. I found that the LLaMA graph shows several spikes in loss, yet LLaMA2's curve appears seamlessly smooth.
Could it be that the loss curve for LLaMA2 has been smoothed out, or is there another explanation for this difference?
Thanks!
[1] https://arxiv.org/abs/2302.13971 [2] https://arxiv.org/abs/2307.09288