llama-recipes Analysis of loss spikes in LLaMA pretrain

Analysis of loss spikes in LLaMA pretrain

Open zhipeng93 opened this issue 11 months ago • 1 comments

Dear LLaMA Teams,

A huge thank you for making your remarkable work available to the public! I've taken a close look at the pretraining loss curves depicted in Figure 1 of LLaMA [1] and in Figure 5 of LLaMA2 [2]. I found that the LLaMA graph shows several spikes in loss, yet LLaMA2's curve appears seamlessly smooth.

Could it be that the loss curve for LLaMA2 has been smoothed out, or is there another explanation for this difference?

Thanks!

[1] https://arxiv.org/abs/2302.13971 [2] https://arxiv.org/abs/2307.09288

Mar 06 '24 02:03 zhipeng93

llama-recipes llama-recipes copied to clipboard

Analysis of loss spikes in LLaMA pretrain

llama-recipes
llama-recipes copied to clipboard