latent-diffusion Wired loss and reconstruction results when training the autoencoder

I'm trying to train the AE on my own dataset (~10w) with the default config file autoencoder_kl_32x32x4.yaml. I only decrease the learning rate from 4.5e-6 to 1e-6 since I use a smaller batchsize. However, the training losses are wired and the reconstructiuon results are unsatisfactory. Could anyone give some suggestions? Thanks in advance! ![3](https://github.com/CompVis/l Recon results GT image

Aug 24 '23 02:08 OwalnutO

I think it might be that kl_weight is too small, you can try increasing kl_weight in the config

Aug 28 '23 15:08 illrayy

@OwalnutO Hi, I encounter the same problem, have you solved it by increasing kl_weight? thank you!

Oct 12 '23 03:10 houqingying

You may find answers in here #187 . By the way, why does the reconstruction have a different size to the GT image?

Oct 24 '23 14:10 GuHuangAI