CogVideo Difference between DDIMScheduler and CogVideoXDDIMScheduler

Hi, I noticed that one of the differences between these schedulers is that in CogVideoXDDIMScheduler, the equation for latent update has been modified to:

 a_t = ((1 - alpha_prod_t_prev) / (1 - alpha_prod_t)) ** 0.5
 b_t = alpha_prod_t_prev**0.5 - alpha_prod_t**0.5 * a_t
 prev_sample = a_t * sample + b_t * pred_original_sample

in line 391 - https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/schedulers/scheduling_ddim_cogvideox.py#L391

instead of:

# 6. compute "direction pointing to x_t" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf
pred_sample_direction = (1 - alpha_prod_t_prev - std_dev_t**2) ** (0.5) * pred_epsilon

# 7. compute x_t without "random noise" of formula (12) from https://arxiv.org/pdf/2010.02502.pdf
prev_sample = alpha_prod_t_prev ** (0.5) * pred_original_sample + pred_sample_direction

from DDIMScheduler in line 448 - https://github.com/huggingface/diffusers/blob/89e4d6219805975bd7d253a267e1951badc9f1c0/src/diffusers/schedulers/scheduling_ddim.py#L448

Can you explain what is the reasoning for this modification? Reference to specific equation in a paper?

This will help a lot, Thank you!

Dec 11 '24 12:12 DanahYatim

same question

Jan 08 '25 14:01 nini0919

same question

Jan 14 '25 15:01 1248818919

CogVideoXDDIMSampler is equivalent to #7 compute x_t without "random noise". Just set std_dev_t=0, no randomness. It's just a different way of writing the same formula.

Jan 15 '25 05:01 tengjiayan20