denoising-diffusion-pytorch
denoising-diffusion-pytorch copied to clipboard
when do we use objective "pred_x_start"?
Hi diffusion developers,
Thank you for the open source development!
I have a naive question about the objective "pred_x_start". If we use this objective, after training we have a model that can directly denoise from any timestep xt to x0. In this case, what is the purpose of reverse diffusion process with >1 timesteps?
There are essentially two possible outcomes after training:
- We have a well trained and PERFECT denoising model that always gives ideal x0. The reverse diffusion seems to be a waste of time adding noise to the perfect x0 at each timestep.
- We have a regular denoising model that gives approximately optimal x0. However, the reverse diffusion will keep adding noise during the process. This is like adding extra error (noise) on top of existing approximation error, which seems to make it even harder for the model to denoise.
Best, Leo
To clarify, the p(x(t-1)|xt) is the denoise step in most papers, which may be unncessary as discussed. As Eq. 9 in https://arxiv.org/pdf/2107.00630.pdf, the well trained model by "predict_x_start" is already capable of producing x0 from xt.
I had the same question. I read a couple of blogs, but there's no ones that clarify this issues.
Hello, Do you figure it out? I have the same question too. I don't know which paper should I read:(
We can use the objective "pred_x_start" as self-condition. As described in the annotated code, 'if doing self-conditioning, 50% of the time, predict x_start from current set of times and condition with unet with that this technique will slow down training by 25%, but seems to lower FID significantly'.