fsdp_qlora icon indicating copy to clipboard operation
fsdp_qlora copied to clipboard

nan when the input length is large

Open bilalghanem opened this issue 10 months ago • 5 comments

Hi

Thanks for your efforts folks! While I was testing the code on my own dataset, I found that when the length of the input is large (~4000), the loss becomes Nan from the first step: Epoch 0, Loss nan, LR 1.00e-05: 12%|█████

For the same dataset, when I truncate my input to something shorter, I start to see the loss. What is the problem?

bilalghanem avatar Apr 06 '24 00:04 bilalghanem