diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

DDIM produces incorrect samples with SDXL (epsilon or v-prediction)

Open bghira opened this issue 1 year ago • 37 comments

Describe the bug

When generating images with SDXL and DDIM, there is some residual noise in the outputs.

This leads to a "smudgy" look, and in cases where fewer steps are used, DDIM and Euler diverge a lot more than they should because of the cumulative impact of not having the timesteps aligned properly.

In some brief tests, it looks like simply adding an extra timestep with a zero sigma to the end of the schedule resolves the problem.

Reproduction

This script uses a modified Euler scheduler to create fully-denoised images:

import PIL
import requests
import torch
import numpy as np
from diffusers import StableDiffusionXLPipeline, EulerDiscreteScheduler

model_id = "ptx0/terminus-xl-gamma-training"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id, add_watermarker=False, torch_dtype=torch.bfloat16).to("cuda")
generator = torch.Generator("cuda").manual_seed(420420420)

prompt = "the artful dodger, cool dog in sunglasses sitting on a recliner in the dark, with the white noise reflecting on his sunglasses"
num_inference_steps = 30
guidance_scale = 7.5
def rescale_zero_terminal_snr_sigmas(sigmas):
    sigmas = sigmas.flip(0)
    alphas_cumprod = 1 / ((sigmas * sigmas) + 1)
    alphas_bar_sqrt = alphas_cumprod.sqrt()

    # Store old values.
    alphas_bar_sqrt_0 = alphas_bar_sqrt[0].clone()
    alphas_bar_sqrt_T = alphas_bar_sqrt[-1].clone()

    # Shift so the last timestep is zero.
    alphas_bar_sqrt -= (alphas_bar_sqrt_T)

    # Scale so the first timestep is back to the old value.
    alphas_bar_sqrt *= alphas_bar_sqrt_0 / (alphas_bar_sqrt_0 - alphas_bar_sqrt_T)

    # Convert alphas_bar_sqrt to betas
    alphas_bar = alphas_bar_sqrt**2  # Revert sqrt
    alphas_bar[-1] = 4.8973451890853435e-08
    sigmas = ((1 - alphas_bar) / alphas_bar) ** 0.5
    return sigmas.flip(0)


zsnr = getattr(pipe.scheduler.config, 'rescale_betas_zero_snr', False)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
if zsnr:
    tsbase = pipe.scheduler.set_timesteps
    def tspatch(*args, **kwargs):
        tsbase(*args, **kwargs)
        pipe.scheduler.sigmas = rescale_zero_terminal_snr_sigmas(pipe.scheduler.sigmas)
    pipe.scheduler.set_timesteps = tspatch
    sigmas = pipe.scheduler.betas

edited_image = pipe(
   prompt=prompt, 
   num_inference_steps=num_inference_steps, 
   guidance_scale=guidance_scale,
   generator=generator,
    guidance_rescale=0.7
).images[0]
edited_image.save("edited_image.png")

It uses the Sigmas code ported by @Beinsezii in #6024 image

However, with vanilla DDIM, the results are far worse:

import PIL
import requests
import torch
import numpy as np
from diffusers import StableDiffusionXLPipeline

model_id = "ptx0/terminus-xl-gamma-training"
pipe = StableDiffusionXLPipeline.from_pretrained(model_id, add_watermarker=False, torch_dtype=torch.bfloat16).to("cuda")
generator = torch.Generator("cuda").manual_seed(420420420)

prompt = "the artful dodger, cool dog in sunglasses sitting on a recliner in the dark, with the white noise reflecting on his sunglasses"
num_inference_steps = 30
guidance_scale = 7.5
edited_image = pipe(
   prompt=prompt, 
   num_inference_steps=num_inference_steps, 
   guidance_scale=guidance_scale,
   generator=generator,
    guidance_rescale=0.7
).images[0]
edited_image.save("edited_image.png")

image

Logs

No response

System Info

  • diffusers version: 0.21.4
  • Platform: Linux-5.19.0-45-generic-x86_64-with-glibc2.31
  • Python version: 3.9.16
  • PyTorch version (GPU?): 2.1.0+cu118 (True)
  • Huggingface_hub version: 0.16.4
  • Transformers version: 4.30.2
  • Accelerate version: 0.18.0
  • xFormers version: 0.0.22.post4+cu118
  • Using GPU in script?: A100-80G PCIe
  • Using distributed or parallel set-up in script?: FALSE

Who can help?

@patrickvonplaten @yiyixuxu

bghira avatar Dec 06 '23 01:12 bghira

For me, brighter images make it more noticeable.

DDIM

image

Euler (Patched)

image

bghira avatar Dec 06 '23 01:12 bghira

@yiyixuxu could you take a look here?

patrickvonplaten avatar Dec 06 '23 23:12 patrickvonplaten

hi @bghira is this resolved by #6024?

yiyixuxu avatar Dec 19 '23 04:12 yiyixuxu

for euler yes, but that already appends an additional scheduler step

bghira avatar Dec 19 '23 04:12 bghira

I thought ddim had incorrect samples regardless of ZSNR or not. If the solution is to simply use euler and leave ddim broken then it may as well be deprecated.

The fact that euler needs an extra 0 sigma to avoid the residual noise issue and DPM has such options as euler_at_final leads me to believe there's a bigger problem with how the samplers are called, so either ddim and the rest all need Band-Aids or that off-by-one issue or whatever it is needs to be found.

Beinsezii avatar Dec 19 '23 05:12 Beinsezii

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 15 '24 15:01 github-actions[bot]

that stale bot is the worst! @patrickvonplaten it should probably just be removed from the project due to how many good issues just get juked.

bghira avatar Jan 19 '24 02:01 bghira

also kinda crazy this remains an issue for more than a month?

bghira avatar Jan 19 '24 02:01 bghira

Could it be that this issue is resolved with: https://github.com/huggingface/diffusers/pull/6477 ? cc @yiyixuxu can you check?

patrickvonplaten avatar Jan 19 '24 10:01 patrickvonplaten

no and that pr didnt fix dpm multistep solver either, it still has residual noise cc @AmericanPresidentJimmyCarter

bghira avatar Jan 19 '24 13:01 bghira

I am using the code from #6647 with DPMSolverMultistep, karras_timesteps, euler_at_final=True, and things like logos still have residual noise instead of outputting a flat colour as expected. Euler and DDIM do not seem to fix this

G5ZQKPjRSQyC

And here it is with DPMSolverMultistep, karras_timesteps, final_sigmas_type="denoise_to_zero"

8hPWZ4vXPOAi

To easily reproduce the noise with solid colours, just prompt something like "Brooklyn pizza shop logo" and then open your fav image editor and crank brightness+contrast to see it clearly. Around the edges is probably jpg noise but I don't believe it all is.

pwxCha0Lii3f

noise

8hPWZ4vXPOAi_noise

the red dots are the "invisible" watermarker. #4014

bghira avatar Jan 19 '24 15:01 bghira

the red dots are the "invisible" watermarker. #4014

I am using:

class NoWatermark:
    def apply_watermark(self, img):
        return img
...
pipe.watermarker = NoWatermark

edit: Oh, I see.

class NoWatermark:
    def apply_watermark(self, img):
        return img
...
- pipe.watermarker = NoWatermark
+ pipe.watermark = NoWatermark

Now there is still some noise, but it is reduced.

GhnWtPEZJz0r2

I think I have a similar or related issue.

I created an image with diffusers and auto1111 with the same parameters, but got different images, with diffusers being worse quality (especially more noise). Does anyone have an idea what could make that difference?

Relevant diffusers code with parameters:

pipe = StableDiffusionXLPipeline.from_single_file(".\models\Stable-diffusion\sdxl\sd_xl_base_1.0_0.9vae.safetensors", torch_dtype=torch.float16)
prompt = "concept art Amber Temple, snow, frigid air, snow-covered peaks of the mountains, dungeons and dragons style, dark atmosphere . digital artwork, illustrative, painterly, matte painting, highly detailed"
negative_prompt = "photo, photorealistic, realism, ugly"
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
image = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=8, num_inference_steps=20, width=1024, height=1024, generator=torch.Generator(device='cuda').manual_seed(1337), use_karras_sigmas=True).images[0]

Auto1111 (DPM++ 2M Karras): 10915-1337-concept art Amber Temple, snow, frigid air, snow-covered peaks of the mountains, dungeons and dragons style, dark atmosphere   d diffusers v0.25.1: image (5)

Slightly different results. https://github.com/huggingface/diffusers/pull/6477 seemed to fix that issue, but didn't. Also https://github.com/huggingface/diffusers/issues/6295 couldn't help.

djdookie avatar Jan 22 '24 23:01 djdookie

@djdookie, I think you have a typo in your code snippet. Note that you should pass use_karras_sigmas=True to the from_config(...) call not to the pipeline call. The code snippet should looks as follows:

pipe = StableDiffusionXLPipeline.from_single_file(".\models\Stable-diffusion\sdxl\sd_xl_base_1.0_0.9vae.safetensors", torch_dtype=torch.float16)
prompt = "concept art Amber Temple, snow, frigid air, snow-covered peaks of the mountains, dungeons and dragons style, dark atmosphere . digital artwork, illustrative, painterly, matte painting, highly detailed"
negative_prompt = "photo, photorealistic, realism, ugly"
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)
image = pipe(prompt, negative_prompt=negative_prompt, guidance_scale=8, num_inference_steps=20, width=1024, height=1024, generator=torch.Generator(device='cuda').manual_seed(1337)).images[0]

see the diff:

- pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config, use_karras_sigmas=True)

patrickvonplaten avatar Jan 23 '24 11:01 patrickvonplaten

When I run the correct code in a colab, I'm getting good results: https://colab.research.google.com/drive/1IXZRZk6TYVG9uTDjocsUEfynfp5gyeoe?usp=sharing

(make sure to use current diffusers main here)

The resulting image looks very similar to A1111 download

patrickvonplaten avatar Jan 23 '24 11:01 patrickvonplaten

using karras sigmas is incompatible with zero-terminal SNR, no? i wouldn't say it looks very similar other than compositionally. the contrast is totally washed out

bghira avatar Jan 23 '24 14:01 bghira

@patrickvonplaten Good finding. This indeed solved my issue. And I don't have washed out colors btw. Thank you so much! Before: imageA After code correction: imageB Still slightly different than the A1111 image I posted earlier, but quality is good again and remaining noise is gone.

djdookie avatar Jan 24 '24 06:01 djdookie

that still has residual noise in the sky, you can see the splotchy colouring there. try a retrieving a vector style image or any of the demo prompts from above.

bghira avatar Jan 24 '24 15:01 bghira

that still has residual noise in the sky, you can see the splotchy colouring there. try a retrieving a vector style image or any of the demo prompts from above.

I don't see any splotchy colouring tbh, but maybe I'm also just getting old and my vision is weaker than it used to haha

patrickvonplaten avatar Jan 26 '24 12:01 patrickvonplaten

using karras sigmas is incompatible with zero-terminal SNR, no? i wouldn't say it looks very similar other than compositionally. the contrast is totally washed out

Not in my experience

I don't see any splotchy colouring tbh, but maybe I'm also just getting old and my vision is weaker than it used to haha

EulerDiscreteScheduler, DDIMScheduler... montage ...EulerDiscreteScheduler(use_karras_sigmas=true), DPMSolverMultistepScheduler(use_karras_sigmas=True)

  • positive: flat vector artwork of a kitten looking up at the night sky
  • negative: blurry
  • Model: ptx0/terminus-xl-gamma-training (V-PRED ZSNR)
  • seed: 1 (cpu f32)
  • guidance 8 + 0.7 rescale
  • 30 steps

Using diffusers master d4c7ab7bf1a00b8f416b3d20b77babac86f7fb44 with my own app I think this is a fairly obvious demonstration that both DDIM and probably DPM have timestep issues. DPM doesn't have a ZSNR patch yet so it'll naturally have less contrast.

Beinsezii avatar Jan 27 '24 00:01 Beinsezii

that still has residual noise in the sky, you can see the splotchy colouring there. try a retrieving a vector style image or any of the demo prompts from above.

I don't see any splotchy colouring tbh, but maybe I'm also just getting old and my vision is weaker than it used to haha

@patrickvonplaten i understand, it's something that you have to see quite a lot to really recognise it.

one oddity is that the same seed has the same splotchy pattern across every image. it's simply some deterministic noise being added/not removed completely

bghira avatar Jan 28 '24 00:01 bghira

@bghira for ddim, do you want to open a PR with your fix so we can start from there? This issue is getting a little bit confusing now since we are also talking about issues across many schedulers

In some brief tests, it looks like simply adding an extra timestep with a zero sigma to the end of the schedule resolves the problem.

yiyixuxu avatar Jan 29 '24 19:01 yiyixuxu

no, i havent had great experiences opening PRs for this project for the last handful of months, they become stale and close automatically.

bghira avatar Jan 29 '24 20:01 bghira

Hi @bghira

I reopened this one https://github.com/huggingface/diffusers/issues/5969 - is there any other issues from your project that have been automatically closed? please let me know

I'm sorry that we let perfectly good issues go stale. This particular issue is a relatively low priority for me and I haven't been able to find time to work on this because (1) DDIM is not a common choice for SDXL (2) the scheduler PRs are most time-consuming

I should have been more upfront about this and should be more clear about the expectations. I'm sorry and I will do better next time. And please be a little bit patient with us in the meantime. Thanks

YiYi

yiyixuxu avatar Jan 30 '24 06:01 yiyixuxu

well since that time, a colleague has ported zero terminal snr to Euler. DDIM was the only choice til that. i dont personally meed this fixed, i dont think ddim is very useful considering euler works basically the same. if you wanted to simply remove ddim i would think thats fine

bghira avatar Jan 30 '24 13:01 bghira

the contrast is totally washed out

@bghira The washed out colors in SDXL are likely from this issue: #6753

spezialspezial avatar Jan 30 '24 15:01 spezialspezial

it could be for some cases, but in this one the user didn't move away from from_single_file and their results had less contrast issue

bghira avatar Jan 30 '24 15:01 bghira