denoising-diffusion-pytorch icon indicating copy to clipboard operation
denoising-diffusion-pytorch copied to clipboard

Dumb Question about DM vs LDM

Open fujistoo opened this issue 1 year ago • 2 comments

Do they only differ between the use of VAE to encode the inputs into embedding (and the conditional input part)? So if I wanted to make this in latent space, I'd use wrap this whole thing within the VAE?

fujistoo avatar Apr 21 '23 10:04 fujistoo

@sztoo yea i'm trying to figure that out too, but i don't see why it should not work out of the box for VAE latent embedding space, as it is zero mean and unit variance

i'm currently looking at naturalspeech2 and i'm confused why it is done post-quantization as opposed to Rombach et al. latent diffusion paper, where they do it before..

lucidrains avatar Apr 22 '23 17:04 lucidrains

it is here shortly explained. https://theaisummer.com/diffusion-models/

jS5t3r avatar Aug 02 '23 12:08 jS5t3r