ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

INFO: Found overflow. Skip step.

Open stephencurry-web opened this issue 1 year ago • 0 comments

I trained Llama2-7B-chat on the Alpaca dataset, and when I set the batch size to 2 or 4, "INFO: Found overflow. Skip step. " appeared at each step of the entire training process, and the gradient is nan. Everything is OK when I set the batch size to 1. May I ask what the reason is?

stephencurry-web avatar Nov 29 '23 10:11 stephencurry-web