stable-dreamfusion
stable-dreamfusion copied to clipboard
Only black images being generated || loss=nan after epoch 10.
Description
I used the model the previous week. Some generations worked, others not, generating just noise. I tried it today but all the time it ends up generating black images and with loss=nan. I think nothing has changed with the files. I tried cloning the repository again but it doesn't work. Here is how the training starts:
And then after epoch 10 it gets this:
The validation folder looks like that:
python main.py --text "a hamburger" --workspace trial -O --albedo Is also not helping
The thing that I don't understand is why it suddenly stopped working. Any idea?
Sometimes the loss is nan little bit after epoch ten. But always after epoch ten the images generated are black. Also, is it normal that the loss is so low?
Steps to Reproduce
Prompts given in the README
Expected Behavior
No loss=nan
Environment
Ubuntu 22.04 torch.version --> Version: '1.13.1+cu117'
@nazarPuriy Hi, this is strange, maybe you could try using full precision mode? (commenting the opt.fp16 = True
line in main.py)
I think the problem is the nvidia drivers. I switched from 525 to 470 and it stopped working. Now I am using 470 and it works again. Any idea why this is happening?
@nazarPuriy Hi, this is strange, maybe you could try using full precision mode? (commenting the
opt.fp16 = True
line in main.py)
I meet the same problem. Maybe It is a bug need to fix, Do you have any idea of why it happens? Thanks!
Reverting back to an older commit seems to be working for now. I am facing this issue with torch '2.0.1+cu118', I tried training with a lower lr which seems to prevent the nans however the results are not good.
I checked that in my case I face this issue particularly with -O2 and not -O.