llama-recipes icon indicating copy to clipboard operation
llama-recipes copied to clipboard

Analysis of loss spikes in LLaMA pretrain

Open zhipeng93 opened this issue 11 months ago • 1 comments

Dear LLaMA Teams,

A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1 of LLaMA [1] and in Figure 5 of LLaMA2 [2]. I found that the LLaMA graph shows several spikes in loss, yet LLaMA2's curve appears seamlessly smooth.

image

image

Could it be that the loss curve for LLaMA2 has been smoothed out, or is there another explanation for this difference?

Thanks!

[1] https://arxiv.org/abs/2302.13971 [2] https://arxiv.org/abs/2307.09288

zhipeng93 avatar Mar 06 '24 02:03 zhipeng93