MuseTalk icon indicating copy to clipboard operation
MuseTalk copied to clipboard

训练到一定时间loss=nan

Open zhanglv0209 opened this issue 1 year ago • 5 comments

训练到一定时间loss=nan,这个是什么原因?是不是有问题? Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144] 同时val里的图片: image

zhanglv0209 avatar Nov 26 '24 05:11 zhanglv0209

训练到一定时间loss=nan,这个是什么原因?是不是有问题? Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144] 同时val里的图片: image

你好,请问你解决了吗?

Echo-jyt avatar Dec 17 '24 01:12 Echo-jyt

遇到了同样的问题,大概5000步损失就nan了。请问你是从头开始训,还是接着预训练权重训?

foreverhell avatar Dec 24 '24 10:12 foreverhell

训练到一定时间loss=nan,这个是什么原因?是不是有问题? Steps: 2%|▍ | 92679/5900000 [19:15:43<1185:20:02, 1.36it/s, backward=0.177, data=0.00789, img_process=0.00704, loss=nan, lr=4.99e-5, unet=0.0986, vae=0.0144] 同时val里的图片: image

你好,请问你解决了吗?

不能断点续训,从头开始就没问题了

zhanglv0209 avatar Dec 24 '24 16:12 zhanglv0209

遇到了同样的问题,大概5000步损失就nan了。请问你是从头开始训,还是接着预训练权重训?

不能断点续训,从头开始就没问题了

zhanglv0209 avatar Dec 24 '24 16:12 zhanglv0209

@zhanglv0209 How much data are you training the model on? I trained for 5 minutes of conversation, loss decreased during the entire training, but the model as a result was no different from the original model. I trained a 100,000 epoch model on an L40S

Siziff avatar Feb 13 '25 14:02 Siziff