latent-diffusion
latent-diffusion copied to clipboard
autoencoder for LDM
Hi! Could you put which autoencoding models correspond to which LDMs on the table, please? Maybe I am missing this information somewhere, but it seems it's not clear which one is for which.
@seung-kim I was struggling with this too. I ran the script scripts/download_first_stages.sh
which downloaded all the autoencoders, With each autoencoder there is a config.yaml
file that says the training data was ldm.data.openimages.FullOpenImagesTrain
. So seems they were all trained on the OpenImages dataset?
@ablattmann @rromb could you please confirm this and also add the information to the README?
@seung-kim I was struggling with this too. I ran the script
scripts/download_first_stages.sh
which downloaded all the autoencoders, With each autoencoder there is aconfig.yaml
file that says the training data wasldm.data.openimages.FullOpenImagesTrain
. So seems they were all trained on the OpenImages dataset?@ablattmann @rromb could you please confirm this and also add the information to the README?
Having the same question, have you fixed it yet?
the class FullOpenImagesTrain
does not exist, maybe the file of ldm/data/openimages.py is missing, could you check that @rromb @ablattmann
Hi did anyone figure this out??
@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use lr=4e-4, batch_size=1024
. For from scratch i use lr=4e-6, batch_size=1024
. I use Adam optimizer of betas=(0.5, 0.9)
following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.
@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use
lr=4e-4, batch_size=1024
. For from scratch i uselr=4e-6, batch_size=1024
. I use Adam optimizer ofbetas=(0.5, 0.9)
following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.
Hi @keyu-tian , I am curious about the distribution of the length of image short side in OpenImages. The vae is trained using augmentation (resize to 384 then random crop to 256), which means all images are downsampled to 384?
@mia01 @Eudea @vvvm23 @seung-kim I think im training VQVAEs well on OpenImages. Just with a random crop augmentation (resize to 384 then random crop to 256) and normalizing pixels from [0, 1] to [-1, 1]. For finetuning i use
lr=4e-4, batch_size=1024
. For from scratch i uselr=4e-6, batch_size=1024
. I use Adam optimizer ofbetas=(0.5, 0.9)
following https://github.com/CompVis/taming-transformers/blob/3ba01b241669f5ade541ce990f7650a3b8f65318/taming/models/vqgan.py#L128.
Hi @keyu-tian . I'm curious if you've done any experiments with VAE instead of VQGAN? I get the impression that the grid effect is hard to eliminate, should the discriminative loss weight be increased?