latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Embedding dimensions of LDM-VQ models are different wrt to VQGAN's

Open joanrod opened this issue 1 year ago • 0 comments

I realize that the configuration of VQ autoencoders in Latent Diffusion is different than the one used in VQGAN (taming-transformers). Specifically, I see that embed_dim and z_channels have low values (3, 4, ...) in Latent Diffusion (https://github.com/CompVis/latent-diffusion/blob/a506df5756472e2ebaf9078affdde2c4f1502cd4/models/first_stage_models/vq-f8/config.yaml#L5) whereas in VQGAN the values were larger (256, 512) (https://github.com/CompVis/taming-transformers/blob/24268930bf1dce879235a7fddd0b2355b84d7ea6/configs/imagenet_vqgan.yaml#L5)

TL;DR, what is the reason that the Z embedding dimension is lower in Latent Diffusion? Thanks!

joanrod avatar Sep 04 '22 09:09 joanrod