open_llama question regarding training stability

question regarding training stability

Open lyccol opened this issue 1 year ago • 0 comments

trafficstars

I have a question regarding training stability. I downloaded the complete dataset of Redpajama v1 from Hugging Face and followed the parameter settings from the Llama1 paper for data mixture and model tuning. I trained two model sizes, 1.8B and 7B. Unfortunately, the 7B model experienced a rise in loss after 300 billion tokens, and the 1.8B model showed a similar increase after 250 billion tokens. How can I address this issue of training instability?

1 8B

Dec 25 '23 08:12 lyccol

open_llama open_llama copied to clipboard

question regarding training stability

open_llama
open_llama copied to clipboard