latent-diffusion
latent-diffusion copied to clipboard
Question about hyperparameters
I've quite a lot of questions about this model. I've successfully trained latent diffusion on AFHQ dataset. But i'm having a hard time understanding many hyperparameters in yaml files.
In Autoencoder yaml:
-
embed_dim
: Why we are using embeddings in an autoencoder? -
n_embed
: What is this? -
double_z
: What is the purpose of this? I've noticed that it's True for KL autoencoder & False for VQ autoencoder. Why? -
ch
: I know it means channels. But how does this changes model architecture? -
ch_mult
: How does this work? -
lossconfig.target
: This is set totaming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator
is it using a discriminator (like in a GAN)? Why an autoencoder needs an Discriminator? -
lossconfig.params.disc_weight
: Is that related to discriminator in VQLPIPSWithDiscriminator & how does it influences it? -
lossconfig.params.codebook_weight
: What is a codebook weight in a VQ autoencoder?
In Latent Diffusion yaml:
-
first_stage_key
: What is this? In every yaml file it's set toimage
. -
num_timesteps_cond
: What does this do? In every file it's set to1
. -
log_every_t
: How does this work?
I would be grateful for any form of assistance. Thank you!
Hi, Did you figure these out?
@bhosalems No, please update here if you find any answers. Thanks