latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Question about hyperparameters

Open pseudo-usama opened this issue 1 year ago • 2 comments

I've quite a lot of questions about this model. I've successfully trained latent diffusion on AFHQ dataset. But i'm having a hard time understanding many hyperparameters in yaml files.

In Autoencoder yaml:

  • embed_dim: Why we are using embeddings in an autoencoder?
  • n_embed: What is this?
  • double_z: What is the purpose of this? I've noticed that it's True for KL autoencoder & False for VQ autoencoder. Why?
  • ch: I know it means channels. But how does this changes model architecture?
  • ch_mult: How does this work?
  • lossconfig.target: This is set to taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator is it using a discriminator (like in a GAN)? Why an autoencoder needs an Discriminator?
  • lossconfig.params.disc_weight: Is that related to discriminator in VQLPIPSWithDiscriminator & how does it influences it?
  • lossconfig.params.codebook_weight: What is a codebook weight in a VQ autoencoder?

In Latent Diffusion yaml:

  • first_stage_key: What is this? In every yaml file it's set to image.
  • num_timesteps_cond: What does this do? In every file it's set to 1.
  • log_every_t: How does this work?

I would be grateful for any form of assistance. Thank you!

pseudo-usama avatar Jun 18 '23 11:06 pseudo-usama

Hi, Did you figure these out?

bhosalems avatar Jan 08 '24 02:01 bhosalems

@bhosalems No, please update here if you find any answers. Thanks

pseudo-usama avatar Jan 16 '24 16:01 pseudo-usama