Image-Super-Resolution-via-Iterative-Refinement
Image-Super-Resolution-via-Iterative-Refinement copied to clipboard
Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”?
Thanks for your a lot contribution and hard work. Why is the loss of Diffusion model calculated between “RANDOM noise” and “model predicted noise”? Not between “Actual added noise” and “model predicted noise”?
@egshkim I'm working with this repo too so i'll give my 2 cents.
the denoise_fn
which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. so x_recon
is a misnomer - it really should be noise_pred
.
In the inference step p_sample_loop
iteratively infers the amount of noise at each timestep and removes it from the previous noisy image. The code is a bit hard to follow but I think the actual subtraction happens in predict_start_from_noise
: https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/01d27a7cbfa8502be1d8dbd4ee02fcbd5e44389d/model/ddpm_modules/diffusion.py#L158
That function is getting called by p_mean_variance
https://github.com/Janspiry/Image-Super-Resolution-via-Iterative-Refinement/blob/01d27a7cbfa8502be1d8dbd4ee02fcbd5e44389d/model/ddpm_modules/diffusion.py#L179 - in this case x_recon
is actually correctly named.
x_recon = self.predict_start_from_noise(
x, t=t, noise=self.denoise_fn(x, noise_level))
the u-net (denoise_fn
) predicts the noise at whichever timestep and then predict_start_from_noise
removes that noise from x
to give you x_recon
@egshkim I'm working with this repo too so i'll give my 2 cents. the
denoise_fn
which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. sox_recon
is a misnomer - it really should benoise_pred
.
Thanks for your kind and detailed explanation. : ) But actually, my question is about "why random noise is used for loss calculation", rather than where the denoising process actually happened.
In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation. But almost all currently available diffusion training code uses random noise for loss calculation.
Yes - that comes from the original definition of diffusion processes in the first paper: https://arxiv.org/pdf/2006.11239.pdf (specifically equation 14):
The real loss function (Equations 3 and 5) do relate x_t-1
to x_t
via a variational lower bound, but this loss in intractable, the authors go through a derivation / simplification of this loss function which then results in this "simple" L2 loss form (eq 14)
The SR3 paper (https://arxiv.org/pdf/2104.07636.pdf) they also experiment with different loss norms (L1 vs L2) and find that L1 loss gives better results...
I'm no mathematician so the derivation is a bit hard to follow, I also welcome further explanations :)
@egshkim I'm working with this repo too so i'll give my 2 cents. the
denoise_fn
which is the u-net is not actually reconstructing an image from noise - rather it is predicting the amount of noise added at each timestep. sox_recon
is a misnomer - it really should benoise_pred
.Thanks for your kind and detailed explanation. : ) But actually, my question is about "why random noise is used for loss calculation", rather than where the denoising process actually happened.
In my opinion, the actually added noise between step "t-1" and step "t" should be used for loss calculation. But almost all currently available diffusion training code uses random noise for loss calculation.
As far as i know,the function 'default' means that return the actually noise if actually noise is exist,only return the random noise when actually noise is not exist.i hope my answear can help you.