fsdp_qlora
fsdp_qlora copied to clipboard
nan when the input length is large
Hi
Thanks for your efforts folks! While I was testing the code on my own dataset, I found that when the length of the input is large (~4000), the loss becomes Nan from the first step: Epoch 0, Loss nan, LR 1.00e-05: 12%|█████
For the same dataset, when I truncate my input to something shorter, I start to see the loss. What is the problem?