taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

about loss of training stage2 transformers

Open CrossLee1 opened this issue 1 year ago • 8 comments

Dear authors and everyone

I'm trying to reproduce the training of stage2 transformers, but got a very high loss, about 3~4 ..

could anyone provide your training curves on stage2 for reference ?

Thanks a lot ~

CrossLee1 avatar Jun 29 '23 03:06 CrossLee1

I am also interested in that, as I encounter the same phenomenon. Is that normal?

nicolasfischoeder avatar Aug 01 '23 07:08 nicolasfischoeder

@CrossLee1 have you figured it out by now maybe?

@pesser @rromb Can you maybe comment on that? :)

nicolasfischoeder avatar Aug 01 '23 07:08 nicolasfischoeder

my loss is even higher, about 6 (train on ffhq256). and the loss cant go down. Have you figured it out? image

order-a-lemonade avatar Oct 19 '23 07:10 order-a-lemonade

Loss lower than 5 is good enough to generate reasonable images. Mine is 4.6 on CelebAHQ. Have you tried to sample some images?

robertchen245 avatar Dec 26 '23 07:12 robertchen245

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

order-a-lemonade avatar Dec 26 '23 07:12 order-a-lemonade

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

You mentioned that you modify the lr to 4.5e-6. I wonder how many epoch and the batch_size. Though my sampled images looks okay, I want it much smaller 😂

robertchen245 avatar Dec 26 '23 07:12 robertchen245

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

You mentioned that you modify the lr to 4.5e-6. I wonder how many epoch and the batch_size. Though my sampled images looks okay, I want it much smaller 😂

I am not sure where did i mentioned about modifying the lr🤣. But my batch_size is 1. btw, i don't think longer training step is helpful for sample quality, because the generated images of my checkpoint in 400k step are not obviously better than the generated images of the checkpoint provided by authors which is in 13750 step.

order-a-lemonade avatar Dec 26 '23 07:12 order-a-lemonade