Open-Sora
Open-Sora copied to clipboard
train from scratch then loss became nan
hi, when I train the t2v model from scratch, the loss became nan. I know it is important to have pretrained model like pixart. But it is hard to explained why the loss became nan if training the model from scratch.
We do not encounter this problem. One potential possibility is the half-precision training. You should use bf16 instead of fp16.
We do not encounter this problem. One potential possibility is the half-precision training. You should use bf16 instead of fp16.
Thanks, but bf16 is set in config file. Did you train without pixart weights and the loss is not abnormal?
Our computing resource is limited and does not try training from scratch for a long time.
Our computing resource is limited and does not try training from scratch for a long time.
Thanks for your reply! btw, the newest update is awsome!
@leonardodora I have the same problem. Have you solved it
@leonardodora I have the same problem. Have you solved it
Not yet. Maybe a pixart need be retrained