diffusers Implementation error of v_prediction in eulara and eular

I came across a potential inconsistency in the implementation of v_prediction within the Euler scheduler when examining the source code. Specifically, in the computation of pred_original_sample, the formula is given as: pred_original_sample = model_output * (-sigma / (sigma**2 + 1) ** 0.5) + (sample / (sigma**2 + 1)) Where sigma is calculated as: sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5) Substituting the definition of sigma into the equation for pred_original_sample, we arrive at: pred_original_sample = sample * alphas_cumprod - model_output * ((1-alphas_cumprod) ** 0.5) Here, it appears that sample * alphas_cumprod is missing a square root operation on alphas_cumprod. Shouldn't this be sample * (alphas_cumprod**0.5)? This seems to be an inconsistency, especially considering that in the DDIM scheduler as in (alpha_prod_t**0.5) * sample - (beta_prod_t**0.5) * model_output. Could this be an error?

Feb 21 '24 11:02 yelanyelan

hi! thanks for the issue note that sample is scaled with 1/ ((sigma**2 + 1) ** 0.5) for euler scheduler

self.scheduler.scale_model_input() is always called inside pipeline at each step https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047

here is how scale_model_input is implemented for euler scheduler https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255

for ddim we do not scale sample https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238

Feb 22 '24 08:02 yiyixuxu

Thx u. I did not notice the features of the Euler scheduler. Now I fully understand.

YiYi Xu @.***> 于2024年2月22日周四 16:20写道：

hi! thanks for the issue note that sample is scaled with 1/ ((sigma**2 + 1) ** 0.5) for euler scheduler

self.scheduler.scale_model_input() is always called inside pipeline at each step

https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047

here is how scale_model_input is implemented for euler scheduler

https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255

for ddim we do not scale sample https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/7046#issuecomment-1958926802, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXEPNYW4ZTBR7ZQTJ4EZK3YU35TVAVCNFSM6AAAAABDS5AZEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYHEZDMOBQGI . You are receiving this because you authored the thread.Message ID: @.***>

Feb 23 '24 08:02 yelanyelan

hi! thanks for the issue note that sample is scaled with 1/ ((sigma**2 + 1) ** 0.5) for euler scheduler

self.scheduler.scale_model_input() is always called inside pipeline at each step

https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047

here is how scale_model_input is implemented for euler scheduler

https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255

for ddim we do not scale sample

https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238

Hi, may I ask what exactly does scale_model_input do in the EulerDiscreteScheduler?

I've checked the original EDM paper and k-diffusion implementation, the scaling coefficient $s(t)$ is 1 in the paper, and the model input of k-diffusion sample_euler does not require extra scaling.

    for i in trange(len(sigmas) - 1, disable=disable):
        gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
        eps = torch.randn_like(x) * s_noise
        sigma_hat = sigmas[i] * (gamma + 1)
        if gamma > 0:
            x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
        denoised = model(x, sigma_hat * s_in, **extra_args) # no extra scaling for x

But in diffusers implementation, the input latent is scaled by 1 / ((sigma**2 + 1) ** 0.5), may I ask the rationale behind this?

Thanks!

Mar 09 '24 03:03 luocfprime

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 02 '24 15:04 github-actions[bot]