Difference between DDIMScheduler and CogVideoXDDIMScheduler
Hi, I noticed that one of the differences between these schedulers is that in CogVideoXDDIMScheduler, the equation for latent update has been modified to:
a_t = ((1 - alpha_prod_t_prev) / (1 - alpha_prod_t)) ** 0.5
b_t = alpha_prod_t_prev**0.5 - alpha_prod_t**0.5 * a_t
prev_sample = a_t * sample + b_t * pred_original_sample
in line 391 - https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/schedulers/scheduling_ddim_cogvideox.py#L391
instead of:
# 6. compute "direction pointing to x_t" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf
pred_sample_direction = (1 - alpha_prod_t_prev - std_dev_t**2) ** (0.5) * pred_epsilon
# 7. compute x_t without "random noise" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf
prev_sample = alpha_prod_t_prev ** (0.5) * pred_original_sample + pred_sample_direction
from DDIMScheduler in line 448 - https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/schedulers/scheduling_ddim.py#L448
Can you explain what is the reasoning for this modification? Reference to specific equation in a paper?
This will help a lot, Thank you!
same question
same question
CogVideoXDDIMSampler is equivalent to #7 compute x_t without "random noise". Just set std_dev_t=0, no randomness. It's just a different way of writing the same formula.