DeepSeek-Coder Training loss extremely noisy during fine-tuning and randomly goes to 0

Training loss extremely noisy during fine-tuning and randomly goes to 0

Open zpx01 opened this issue 1 year ago • 2 comments

I'm trying to fine-tune the 6.7B model on my own code dataset. I am running a multinode training with fp32 precision on NVIDIA Tesla V100 GPUs with DeepSpeed ZeRO Stage 3. My training loss seems to randomly fluctuate and go down to zero, I've attached my training loss graph below:

I'm running this on 128 GPUs with a train batch size of 1 per device and no gradient accumulation. I'm not sure what could be the cause of this as I haven't seen this happen with other models with the Llama architecture. Would appreciate any general direction to help debug this, thanks!

Jan 26 '24 06:01 zpx01

@DejianYang @pkuzqh Would appreciate any help on this ticket, thanks

Feb 06 '24 04:02 zpx01

DeepSeek-Coder DeepSeek-Coder copied to clipboard

Training loss extremely noisy during fine-tuning and randomly goes to 0

DeepSeek-Coder
DeepSeek-Coder copied to clipboard