Implementation error of v_prediction in eulara and eular
I came across a potential inconsistency in the implementation of v_prediction within the Euler scheduler when examining the source code. Specifically, in the computation of pred_original_sample, the formula is given as:
pred_original_sample = model_output * (-sigma / (sigma**2 + 1) ** 0.5) + (sample / (sigma**2 + 1))
Where sigma is calculated as:
sigmas = np.array(((1 - self.alphas_cumprod) / self.alphas_cumprod) ** 0.5)
Substituting the definition of sigma into the equation for pred_original_sample, we arrive at:
pred_original_sample = sample * alphas_cumprod - model_output * ((1-alphas_cumprod) ** 0.5)
Here, it appears that sample * alphas_cumprod is missing a square root operation on alphas_cumprod. Shouldn't this be sample * (alphas_cumprod**0.5)? This seems to be an inconsistency, especially considering that in the DDIM scheduler as in
(alpha_prod_t**0.5) * sample - (beta_prod_t**0.5) * model_output.
Could this be an error?
hi! thanks for the issue
note that sample is scaled with 1/ ((sigma**2 + 1) ** 0.5) for euler scheduler
self.scheduler.scale_model_input() is always called inside pipeline at each step
https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047
here is how scale_model_input is implemented for euler scheduler
https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255
for ddim we do not scale sample https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238
Thx u. I did not notice the features of the Euler scheduler. Now I fully understand.
YiYi Xu @.***> δΊ2024εΉ΄2ζ22ζ₯ε¨ε 16:20ειοΌ
hi! thanks for the issue note that sample is scaled with 1/ ((sigma**2 + 1) ** 0.5) for euler scheduler
self.scheduler.scale_model_input() is always called inside pipeline at each step
https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047
here is how scale_model_input is implemented for euler scheduler
https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255
for ddim we do not scale sample https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238
β Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/7046#issuecomment-1958926802, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXEPNYW4ZTBR7ZQTJ4EZK3YU35TVAVCNFSM6AAAAABDS5AZEKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJYHEZDMOBQGI . You are receiving this because you authored the thread.Message ID: @.***>
hi! thanks for the issue note that
sampleis scaled with1/ ((sigma**2 + 1) ** 0.5)for euler scheduler
self.scheduler.scale_model_input()is always called inside pipeline at each stephttps://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L1047
here is how
scale_model_inputis implemented for euler schedulerhttps://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_euler_ancestral_discrete.py#L255
for ddim we do not scale
samplehttps://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/src/diffusers/schedulers/scheduling_ddim.py#L238
Hi, may I ask what exactly does scale_model_input do in the EulerDiscreteScheduler?
I've checked the original EDM paper and k-diffusion implementation, the scaling coefficient $s(t)$ is 1 in the paper, and the model input of k-diffusion sample_euler does not require extra scaling.
for i in trange(len(sigmas) - 1, disable=disable):
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
eps = torch.randn_like(x) * s_noise
sigma_hat = sigmas[i] * (gamma + 1)
if gamma > 0:
x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
denoised = model(x, sigma_hat * s_in, **extra_args) # no extra scaling for x
But in diffusers implementation, the input latent is scaled by 1 / ((sigma**2 + 1) ** 0.5), may I ask the rationale behind this?
Thanks!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.