训练到一定时间loss=nan
训练到一定时间loss=nan,这个是什么原因?是不是有问题?
Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144]
同时val里的图片:
训练到一定时间loss=nan,这个是什么原因?是不是有问题? Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144] 同时val里的图片:
你好,请问你解决了吗?
遇到了同样的问题,大概5000步损失就nan了。请问你是从头开始训,还是接着预训练权重训?
训练到一定时间loss=nan,这个是什么原因?是不是有问题? Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144] 同时val里的图片:
你好,请问你解决了吗?
不能断点续训,从头开始就没问题了
遇到了同样的问题,大概5000步损失就nan了。请问你是从头开始训,还是接着预训练权重训?
不能断点续训,从头开始就没问题了
@zhanglv0209 How much data are you training the model on? I trained for 5 minutes of conversation, loss decreased during the entire training, but the model as a result was no different from the original model. I trained a 100,000 epoch model on an L40S
