ColossalAI
ColossalAI copied to clipboard
INFO: Found overflow. Skip step.
I trained Llama2-7B-chat on the Alpaca dataset, and when I set the batch size to 2 or 4, "INFO: Found overflow. Skip step. " appeared at each step of the entire training process, and the gradient is nan. Everything is OK when I set the batch size to 1. May I ask what the reason is?