latent-diffusion autoencoder for LDM

Hi! Could you put which autoencoding models correspond to which LDMs on the table, please? Maybe I am missing this information somewhere, but it seems it's not clear which one is for which.

Jan 17 '22 17:01 seung-kim

@seung-kim I was struggling with this too. I ran the script scripts/download_first_stages.sh which downloaded all the autoencoders, With each autoencoder there is a config.yaml file that says the training data was ldm.data.openimages.FullOpenImagesTrain. So seems they were all trained on the OpenImages dataset?

@ablattmann @rromb could you please confirm this and also add the information to the README?

Mar 13 '22 11:03 vvvm23

@seung-kim I was struggling with this too. I ran the script scripts/download_first_stages.sh which downloaded all the autoencoders, With each autoencoder there is a config.yaml file that says the training data was ldm.data.openimages.FullOpenImagesTrain. So seems they were all trained on the OpenImages dataset?

@ablattmann @rromb could you please confirm this and also add the information to the README?

Having the same question, have you fixed it yet?

Oct 20 '22 01:10 Eudea

the class FullOpenImagesTrain does not exist, maybe the file of ldm/data/openimages.py is missing, could you check that @rromb @ablattmann

Sep 03 '23 14:09 keyu-tian

Hi did anyone figure this out??

Sep 13 '23 22:09 mia01

@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use lr=4e-4, batch_size=1024. For from scratch i use lr=4e-6, batch_size=1024. I use Adam optimizer of betas=(0.5, 0.9) following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.

Oct 08 '23 07:10 keyu-tian

@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use lr=4e-4, batch_size=1024. For from scratch i use lr=4e-6, batch_size=1024. I use Adam optimizer of betas=(0.5, 0.9) following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.

Hi @keyu-tian , I am curious about the distribution of the length of image short side in OpenImages. The vae is trained using augmentation (resize to 384 then random crop to 256), which means all images are downsampled to 384?

Feb 07 '24 00:02 wtliao

@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use lr=4e-4, batch_size=1024. For from scratch i use lr=4e-6, batch_size=1024. I use Adam optimizer of betas=(0.5, 0.9) following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.

Hi @keyu-tian . I'm curious if you've done any experiments with VAE instead of VQGAN? I get the impression that the grid effect is hard to eliminate, should the discriminative loss weight be increased?

Apr 22 '24 13:04 bu135

latent-diffusion latent-diffusion copied to clipboard

autoencoder for LDM

latent-diffusion
latent-diffusion copied to clipboard