open_llama icon indicating copy to clipboard operation
open_llama copied to clipboard

question regarding training stability

Open lyccol opened this issue 1 year ago • 0 comments
trafficstars

I have a question regarding training stability. I downloaded the complete dataset of Redpajama v1 from Hugging Face and followed the parameter settings from the Llama1 paper for data mixture and model tuning. I trained two model sizes, 1.8B and 7B. Unfortunately, the 7B model experienced a rise in loss after 300 billion tokens, and the 1.8B model showed a similar increase after 250 billion tokens. How can I address this issue of training instability?

1 8B 7B

lyccol avatar Dec 25 '23 08:12 lyccol