[Q] Stable diffusion: Predict previous noisy latent in `generate_image`

Open fdtomasi opened this issue 2 years ago • 0 comments

Hello! I have been using stable diffusion and adapted few classes to my use case. However I do not understand the step to predict the previous noisy latent variable here: https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion/stable_diffusion.py#L227C16-L232

pred_x0 = (latent_prev - math.sqrt(1 - a_t) * latent) / math.sqrt(a_t)
latent = (latent * math.sqrt(1.0 - a_prev) + math.sqrt(a_prev) * pred_x0)

I understand the two equations as first predicting the original sample x_0 (equivalent to the following, found in noise_scheduler.step():)

pred_original_sample = (
    sample - math.sqrt(beta_prod) * model_output
) / math.sqrt(alpha_prod)

and secondly to predict x_{t-1} from x_0 and latent. However latent here (on the right side of the equation, while on the left represents the sample x_{t-1}) represents the prediction of the noise added at step t (because it was predicted by the diffusion model using latent_prev), so is there a reason why it is used to generate the previous sample t-1 as well?

And related to it, is there a reason not to directly use latent = noise_scheduler.step(...)? https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/stable_diffusion/noise_scheduler.py#L107 While being more understandable, it automatically generalise for other alphas (that are instead fixed in _ALPHAS_CUMPROD).

Thank you!

Aug 24 '23 13:08 fdtomasi