Image-Super-Resolution-via-Iterative-Refinement Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”?

trafficstars

Thanks for your a lot contribution and hard work. Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”? Not between “Actual added noise” and “model predicted noise”?

May 29 '23 02:05 egshkim

@egshkim I'm working with this repo too so i'll give my 2 cents. the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

In the inference step p_sample_loop iteratively infers the amount of noise at each timestep and removes it from the previous noisy image. The code is a bit hard to follow but I think the actual subtraction happens in predict_start_from_noise: https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/01d27a7cbfa8502be1d8dbd4ee02fcbd5e44389d/model/ddpm_modules/diffusion.py#L158

That function is getting called by p_mean_variance https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/01d27a7cbfa8502be1d8dbd4ee02fcbd5e44389d/model/ddpm_modules/diffusion.py#L179 - in this case x_recon is actually correctly named.

x_recon = self.predict_start_from_noise(
                x, t=t, noise=self.denoise_fn(x, noise_level))

the u-net (denoise_fn) predicts the noise at whichever timestep and then predict_start_from_noise removes that noise from x to give you x_recon

May 29 '23 14:05 Krasner

@egshkim I'm working with this repo too so i'll give my 2 cents. the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

Thanks for your kind and detailed explanation. : ) But actually, my question is about "why random noise is used for loss calculation", rather than where the denoising process actually happened.

In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation. But almost all currently available diffusion training code uses random noise for loss calculation.

May 30 '23 00:05 egshkim

Yes - that comes from the original definition of diffusion processes in the first paper: https://arxiv.org/pdf/2006.11239.pdf (specifically equation 14):

The real loss function (Equations 3 and 5) do relate x_t-1 to x_t via a variational lower bound, but this loss in intractable, the authors go through a derivation / simplification of this loss function which then results in this "simple" L2 loss form (eq 14)

The SR3 paper (https://arxiv.org/pdf/2104.07636.pdf) they also experiment with different loss norms (L1 vs L2) and find that L1 loss gives better results...

I'm no mathematician so the derivation is a bit hard to follow, I also welcome further explanations :)

May 30 '23 01:05 Krasner

@egshkim I'm working with this repo too so i'll give my 2 cents. the denoise_fn which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon is a misnomer - it really should be noise_pred.

Thanks for your kind and detailed explanation. : ) But actually, my question is about "why random noise is used for loss calculation", rather than where the denoising process actually happened.

In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation. But almost all currently available diffusion training code uses random noise for loss calculation.

As far as i know,the function 'default' means that return the actually noise if actually noise is exist,only return the random noise when actually noise is not exist.i hope my answear can help you.

May 30 '23 03:05 whiteYi

Image-Super-Resolution-via-Iterative-Refinement Image-Super-Resolution-via-Iterative-Refinement copied to clipboard

Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”?

Image-Super-Resolution-via-Iterative-Refinement
Image-Super-Resolution-via-Iterative-Refinement copied to clipboard