loss=nan
Hello, thank you very much for your excellent work, but when I was about to reproduce your work recently, I got an error that showed "loss=nan", I was training on the A5000GPU, what is the reason for this?
Hi, thank you for posting this issue. We did not observe this problem in our testing. I am not sure whether this is caused by some corrupted data samples. Can you first try to filter the loss with function torch.nan_to_num and see if it helps?
嗨,感谢您发布此问题。我们在测试中没有观察到这个问题。我不确定这是否是由某些损坏的数据样本引起的。你能先尝试用函数过滤损失,看看是否有帮助吗?
torch.nan_to_num
Hello, thank you very much for your reply. I had a problem in the process of training, I didn't make any changes, in the first or second round of training there would be "loss=nan", I tried to adjust the learning rate, but it didn't work. I used an A5000 graphics card for reproduction