stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

Question about the KL divergence loss

Open marctimjen opened this issue 2 months ago • 1 comments

Hello

I hope someone can help me understand why the KL is calculated as: 0.5 * torch.sum(torch.pow(self.mean, 2) + self.var - 1.0 - self.logvar, dim=[1, 2, 3])

In the DiagonalGaussianDistribution listed here: https://github.com/CompVis/stable-diffusion/blob/21f890f9da3cfbeaba8e2ac3c425ee9e998d5229/ldm/modules/distributions/distributions.py#L44

I am asking, because most loss functions for the VAE I can find use (-1 times this calculations) like this: 0.5 * torch.sum(-torch.pow(self.mean, 2) - self.var + 1.0 + self.logvar, dim=[1, 2, 3])

And I cannot see that we multiply by -1 in the contperceptual loss for instance:

https://github.com/CompVis/stable-diffusion/blob/21f890f9da3cfbeaba8e2ac3c425ee9e998d5229/ldm/modules/losses/contperceptual.py#L83C57-L83C65

Thank you very much in advance :)

marctimjen avatar May 08 '24 14:05 marctimjen

I found this material that does also have the loss on the form that is used here:

https://pyimagesearch.com/2023/10/02/a-deep-dive-into-variational-autoencoders-with-pytorch/

My confusion just happen because most papers write:

image

(From Bishops Deep learning)

And the original implementation of VAE:

image

marctimjen avatar May 08 '24 14:05 marctimjen