denoising-diffusion-pytorch
denoising-diffusion-pytorch copied to clipboard
Under any parameter, loss is always reduced to nan during training
Training under our dataset, img_ size=256 batch_ size=4 or img_ size=128 batch_ Size=16, the final result of all training is that loss is reduced to nan
same problem after 50000 training step
Any idea how to debug this issue?
Training under our dataset, img_ size=256 batch_ size=4 or img_ size=128 batch_ Size=16, the final result of all training is that loss is reduced to nan
Hi, have you found the solution?
Training under our dataset, img_ size=256 batch_ size=4 or img_ size=128 batch_ Size=16, the final result of all training is that loss is reduced to nan
Hi, have you found the solution?
no.....lol
no,I tried to train with various parameters for two weeks, and in the end, the loss was always null. My graphics card is 3080ti
At 2023-03-15 20:01:39, "Jin Yuntao" @.***> wrote:
Training under our dataset, img_ size=256 batch_ size=4 or img_ size=128 batch_ Size=16, the final result of all training is that loss is reduced to nan
Hi, have you found the solution?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Did you double check your data ?(Make sure that all your data does not contain nan values)
I had the same issue. If you are training with amp = True
, be sure to run the script with accelerate launch script.py
. That fixed my problem.