diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Timesteps are not calculated correctly for img2img pipelines.

Open okaris opened this issue 10 months ago • 27 comments

Describe the bug

I have arrived at this conclusion after experimenting with the LCM models/schedulers, and I believe it applies to all schedulers and img2img pipelines. For longer diffusion processes, the effects are more subtle but clearly surface in the case of the LCMScheduler.

For img2img, when we select a strength value, the expected behavior is to select the initial noise/timestep. For instance, a strength value of 0.8 implies adding 80% noise and starting the diffusion from the timestep corresponding to 80% of the total timesteps.

Most models are trained for 1000 timesteps. Considering a 0.8 strength value for diffusing over 20 steps, the timestep schedule should look like this:

[799, 749, 699, 649, 599, 549, 499, 449, 399, 349, 299, 249, 199, 149, 99, 49]

This is implemented in the LCMScheduler class but it's not currently used.

Instead, the img2img pipelines retrieve normal timesteps from a scheduler and process the strength value in the get_timesteps method, as not all schedulers possess this functionality.

With the current implementation, if we want to denoise at 0.8 strength for 20 steps:

  1. We first retrieve the "normal" timesteps from the scheduler:

[951., 901., 851., 801., 751., 701., 651., 601., 551., 501., 451., 401., 351., 301., 251., 201., 151., 101., 51., 1.]

  1. We then select some of them based on strength, as demonstrated below. The timesteps remain the same for different strength values:
Strength: 0.8 Timesteps: [751. 701. 651. 601. 551. 501. 451. 401. 351. 301. 251. 201. 151. 101. 51.   1.]
Strength: 0.79 Timesteps: [751. 701. 651. 601. 551. 501. 451. 401. 351. 301. 251. 201. 151. 101. 51.   1.]
Strength: 0.78 Timesteps: [751. 701. 651. 601. 551. 501. 451. 401. 351. 301. 251. 201. 151. 101. 51.   1.]
Strength: 0.77 Timesteps: [751. 701. 651. 601. 551. 501. 451. 401. 351. 301. 251. 201. 151. 101. 51.   1.]
Strength: 0.76 Timesteps: [751. 701. 651. 601. 551. 501. 451. 401. 351. 301. 251. 201. 151. 101. 51.   1.]

The longer the diffusion process (num_inference_steps), the lesser the effect it has, although this issue is magnified in the case of the LCMScheduler. Let's look at the timesteps for different strength values over 4 inference steps:

Strength: 0.99 Timesteps: [999 759 499 259]
Strength: 0.98 Timesteps: [999 759 499 259]
Strength: 0.97 Timesteps: [999 759 499 259]
Strength: 0.96 Timesteps: [999 759 499 259]
Strength: 0.95 Timesteps: [999 759 499 259]
Strength: 0.94 Timesteps: [999 759 499 259]
Strength: 0.93 Timesteps: [999 759 499 259]
Strength: 0.92 Timesteps: [999 759 499 259]
Strength: 0.91 Timesteps: [999 759 499 259]
Strength: 0.9 Timesteps: [999 759 499 259]
Strength: 0.89 Timesteps: [999 759 499 259]
Strength: 0.88 Timesteps: [999 759 499 259]
Strength: 0.87 Timesteps: [999 759 499 259]
Strength: 0.86 Timesteps: [999 759 499 259]
Strength: 0.85 Timesteps: [999 759 499 259]
Strength: 0.84 Timesteps: [999 759 499 259]
Strength: 0.83 Timesteps: [999 759 499 259]
Strength: 0.82 Timesteps: [999 759 499 259]
Strength: 0.81 Timesteps: [999 759 499 259]
Strength: 0.8 Timesteps: [999 759 499 259]
Strength: 0.79 Timesteps: [999 759 499 259]
Strength: 0.78 Timesteps: [999 759 499 259]
Strength: 0.77 Timesteps: [999 759 499 259]
Strength: 0.76 Timesteps: [999 759 499 259]
Strength: 0.75 Timesteps: [759 499 259]

This effectively leaves us with only a few functional strength values. Below is the correct and expected output:

Strength: 0.99 Timesteps: tensor([979, 739, 499, 259])
Strength: 0.98 Timesteps: tensor([979, 739, 499, 259])
Strength: 0.97 Timesteps: tensor([959, 719, 479, 239])
Strength: 0.96 Timesteps: tensor([959, 719, 479, 239])
Strength: 0.95 Timesteps: tensor([939, 719, 479, 239])
Strength: 0.94 Timesteps: tensor([939, 719, 479, 239])
Strength: 0.93 Timesteps: tensor([919, 699, 459, 239])
Strength: 0.92 Timesteps: tensor([919, 699, 459, 239])
Strength: 0.91 Timesteps: tensor([899, 679, 459, 239])
Strength: 0.9 Timesteps: tensor([899, 679, 459, 239])
Strength: 0.89 Timesteps: tensor([879, 659, 439, 219])
Strength: 0.88 Timesteps: tensor([879, 659, 439, 219])
Strength: 0.87 Timesteps: tensor([859, 659, 439, 219])
Strength: 0.86 Timesteps: tensor([859, 659, 439, 219])
Strength: 0.85 Timesteps: tensor([839, 639, 419, 219])
Strength: 0.84 Timesteps: tensor([839, 639, 419, 219])
Strength: 0.83 Timesteps: tensor([819, 619, 419, 219])
Strength: 0.82 Timesteps: tensor([819, 619, 419, 219])
Strength: 0.81 Timesteps: tensor([799, 599, 399, 199])
Strength: 0.8 Timesteps: tensor([799, 599, 399, 199])
Strength: 0.79 Timesteps: tensor([779, 599, 399, 199])
Strength: 0.78 Timesteps: tensor([779, 599, 399, 199])
Strength: 0.77 Timesteps: tensor([759, 579, 379, 199])
Strength: 0.76 Timesteps: tensor([759, 579, 379, 199])
Strength: 0.75 Timesteps: tensor([739, 559, 379, 199])

I'm happy to submit a PR, but I'm unsure if we want to refactor all the schedulers to support strength input or alter the get_timesteps method.

Thank you!

Reproduction

import torch
from diffusers import LCMScheduler, StableDiffusionImg2ImgPipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

pipe.scheduler= LCMScheduler.from_config(pipe.scheduler.config)

for str in range(50, 100, 2):
    print("-----")
    strength = (100-str) / 100
    print("Strength:", strength)
    pipe.scheduler.set_timesteps(num_inference_steps=4, device="cuda")
    timesteps = pipe.get_timesteps(num_inference_steps=4, strength=strength, device="cuda")
    print("Timesteps Current:", timesteps[0].detach().cpu().numpy())

    pipe.scheduler.set_timesteps(num_inference_steps=4, strength=strength, device="cuda")
    print("Timesteps Correct:", pipe.scheduler.timesteps.detach().cpu().numpy())

Logs

-----
Strength: 0.99
Timesteps Current: [759 499 259]
Timesteps Correct: [979 739 499 259]
-----
Strength: 0.97
Timesteps Current: [759 499 259]
Timesteps Correct: [959 719 479 239]
-----
Strength: 0.95
Timesteps Current: [759 499 259]
Timesteps Correct: [939 719 479 239]
-----
Strength: 0.93
Timesteps Current: [759 499 259]
Timesteps Correct: [919 699 459 239]
-----
Strength: 0.91
Timesteps Current: [759 499 259]
Timesteps Correct: [899 679 459 239]
-----
Strength: 0.89
Timesteps Current: [759 499 259]
Timesteps Correct: [879 659 439 219]
-----
Strength: 0.87
Timesteps Current: [759 499 259]
Timesteps Correct: [859 659 439 219]
-----
Strength: 0.85
Timesteps Current: [759 499 259]
Timesteps Correct: [839 639 419 219]
-----
Strength: 0.83
Timesteps Current: [759 499 259]
Timesteps Correct: [819 619 419 219]
-----
Strength: 0.81
Timesteps Current: [759 499 259]
Timesteps Correct: [799 599 399 199]
-----
Strength: 0.79
Timesteps Current: [759 499 259]
Timesteps Correct: [779 599 399 199]
-----
Strength: 0.77
Timesteps Current: [759 499 259]
Timesteps Correct: [759 579 379 199]
-----
Strength: 0.75
Timesteps Current: [759 499 259]
Timesteps Correct: [739 559 379 199]
-----
Strength: 0.73
Timesteps Current: [499 259]
Timesteps Correct: [719 539 359 179]
-----
Strength: 0.71
Timesteps Current: [499 259]
Timesteps Correct: [699 539 359 179]
-----
Strength: 0.69
Timesteps Current: [499 259]
Timesteps Correct: [679 519 339 179]
-----
Strength: 0.67
Timesteps Current: [499 259]
Timesteps Correct: [659 499 339 179]
-----
Strength: 0.65
Timesteps Current: [499 259]
Timesteps Correct: [639 479 319 159]
-----
Strength: 0.63
Timesteps Current: [499 259]
Timesteps Correct: [619 479 319 159]
-----
Strength: 0.61
Timesteps Current: [499 259]
Timesteps Correct: [599 459 299 159]
-----
Strength: 0.59
Timesteps Current: [499 259]
Timesteps Correct: [579 439 299 159]
-----
Strength: 0.57
Timesteps Current: [499 259]
Timesteps Correct: [559 419 279 139]
-----
Strength: 0.55
Timesteps Current: [499 259]
Timesteps Correct: [539 419 279 139]
-----
Strength: 0.53
Timesteps Current: [499 259]
Timesteps Correct: [519 399 259 139]
-----
Strength: 0.51
Timesteps Current: [499 259]
Timesteps Correct: [499 379 259 139]
-----
Strength: 0.49
Timesteps Current: [259]
Timesteps Correct: [479 359 239 119]
-----
Strength: 0.47
Timesteps Current: [259]
Timesteps Correct: [459 359 239 119]
-----
Strength: 0.45
Timesteps Current: [259]
Timesteps Correct: [439 339 219 119]
-----
Strength: 0.43
Timesteps Current: [259]
Timesteps Correct: [419 319 219 119]
-----
Strength: 0.41
Timesteps Current: [259]
Timesteps Correct: [399 299 199  99]
-----
Strength: 0.39
Timesteps Current: [259]
Timesteps Correct: [379 299 199  99]
-----
Strength: 0.37
Timesteps Current: [259]
Timesteps Correct: [359 279 179  99]
-----
Strength: 0.35
Timesteps Current: [259]
Timesteps Correct: [339 259 179  99]
-----
Strength: 0.33
Timesteps Current: [259]
Timesteps Correct: [319 239 159  79]
-----
Strength: 0.31
Timesteps Current: [259]
Timesteps Correct: [299 239 159  79]
-----
Strength: 0.29
Timesteps Current: [259]
Timesteps Correct: [279 219 139  79]
-----
Strength: 0.27
Timesteps Current: [259]
Timesteps Correct: [259 199 139  79]
-----
Strength: 0.25
Timesteps Current: [259]
Timesteps Correct: [239 179 119  59]
-----
Strength: 0.23
Timesteps Current: []
Timesteps Correct: [219 179 119  59]
-----
Strength: 0.21
Timesteps Current: []
Timesteps Correct: [199 159  99  59]
-----
Strength: 0.19
Timesteps Current: []
Timesteps Correct: [179 139  99  59]
-----
Strength: 0.17
Timesteps Current: []
Timesteps Correct: [159 119  79  39]
-----
Strength: 0.15
Timesteps Current: []
Timesteps Correct: [139 119  79  39]
-----
Strength: 0.13
Timesteps Current: []
Timesteps Correct: [119  99  59  39]
-----
Strength: 0.11
Timesteps Current: []
Timesteps Correct: [99 79 59 39]
-----
Strength: 0.09
Timesteps Current: []
Timesteps Correct: [79 59 39 19]
-----
Strength: 0.07
Timesteps Current: []

System Info

  • diffusers version: 0.27.2
  • Platform: Linux-5.15.0-89-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • PyTorch version (GPU?): 2.0.0+cu117 (True)
  • Huggingface_hub version: 0.20.2
  • Transformers version: 4.37.1
  • Accelerate version: 0.29.2
  • xFormers version: not installed
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?: False

Who can help?

@yiyixuxu @DN6 @sayakpaul

okaris avatar Apr 12 '24 07:04 okaris

Below you can find example img2img sweeps with strength values from 0.5 to 1.0 comparing the current implementation in diffusers and the expected results when we use the correct timesteps:

SDXL

Current: lcm-strength-fix-grid-xl-orig

Fixed: lcm-strength-fix-grid-xl

SD1.5

Current: lcm-strength-fix-grid-orig

Fixed: lcm-strength-fix-grid

okaris avatar Apr 12 '24 10:04 okaris

Hello, I'm interested in your findings,can you tell me how you modified the code to get the above result

fkjkey avatar Apr 12 '24 10:04 fkjkey

@fkjkey for a naive temporary solution you can change the lines:

https://github.com/huggingface/diffusers/blob/279de3c3ffedcb1394518a8f1c950fa30f272390/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py#L1240-L1246

with:

timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps, strength=strength)
# timesteps, num_inference_steps = self.get_timesteps(
#     num_inference_steps,
#     strength,
#     device,
#     denoising_start=self.denoising_start if denoising_value_valid(self.denoising_start) else None,
# )

the same applies for the SD1.5 img2img pipeline

Warning: This solution only works for LCM models using LCMScheduler for now

okaris avatar Apr 12 '24 10:04 okaris

Hi all, good discussion!

  1. At your reproduction code, why didn't you add strength to the first set_timesteps function? The default is 1.0.
  2. In diffusers, a pipeline runs int(num_inference_steps * strength) steps in total not num_inference_steps times.
  3. AFAIK, pipe.scheduler.timesteps preserves num_inference_steps=4 timesteps not executed timesteps as pipe.get_timesteps() returns.

tolgacangoz avatar Apr 12 '24 12:04 tolgacangoz

Hi @standardAI

  1. please see my response above. The diffusers code doesn’t use the strength parameter right now as not all schedulers support it
  2. Diffusers make that calculation in the get_timesteps method which ends up being wrong
  3. I’m not sure I get this one 🤔

okaris avatar Apr 12 '24 12:04 okaris

Looking at the link @standardAI shared, the strength parameter is documented as

The strength and num_inference_steps parameters are related because strength determines the number of noise steps to add. For example, if the num_inference_steps is 50 and strength is 0.8, then this means adding 40 (50 * 0.8) steps of noise to the initial image and then denoising for 40 steps to get the newly generated image.

I believe instead of changing the num_inference_steps based on strength, it makes more sense to control the amount of noise added, by changing the starting timestep of the denoising process, aligns better with the diffusion logic. This would also let the pipeline adhere to the actual number of denoising steps chosen by the user.

okaris avatar Apr 12 '24 14:04 okaris

Hmm, it seems that its origins are based on strong foundations 🤔. Also, newer ones.

tolgacangoz avatar Apr 12 '24 18:04 tolgacangoz

Well the above images pretty much speak for themselves 😄

okaris avatar Apr 12 '24 18:04 okaris

Thanks for bringing this up, it's very interesting indeed! As far as I understand, the proposed solution would be very nice in the sense that we get to decouple the number of inference steps from strength, i.e., no matter what value of strength you use you will always be doing num_inference_steps steps instead of re-defining it to be num_inference_steps*strength.

The set_timesteps method in LCMScheduler is quite... involved. I'm still quite new to diffusers so there's a lot of stuff to wrap my head around, and schedulers are already complicated enough as it is. But hypothetically, if we were to make this fix applicable to any scheduler, am I correct (more or less) in saying that you'd have to define timesteps as the following inside the call to the pipeline?:

T = num_train_timesteps
timesteps = np.linspace(0, int(strength*(T-1)), num_inference_steps)

E.g. for num_train_timesteps=1000:

  • for strength=0.8 we get: array([ 0, 266, 532, 799], dtype=int32);
  • for strength=0.1 we get: array([ 0, 33, 66, 99], dtype=int32).
  • etc.

Thanks.

christopher-beckham avatar Apr 12 '24 21:04 christopher-beckham

@yiyixuxu @DN6 @sayakpaul any thoughts on how to proceed here?

okaris avatar Apr 14 '24 09:04 okaris

@yiyixuxu Could you take a look here please?

DN6 avatar Apr 15 '24 03:04 DN6

@okaris Can you share a minimal code example demonstrating how you're calculating the timesteps with strength?

DN6 avatar Apr 15 '24 03:04 DN6

@DN6 my implementation is based on: https://github.com/huggingface/diffusers/blob/b69fd990ad8026f21893499ab396d969b62bb8cc/src/diffusers/schedulers/scheduling_lcm.py#L397-L400 https://github.com/huggingface/diffusers/blob/b69fd990ad8026f21893499ab396d969b62bb8cc/src/diffusers/schedulers/scheduling_lcm.py#L479-L483

Here's a simple an naive implementation based on the LCMScheduler:

import numpy as np

num_timesteps = 1000
original_inference_steps = 50
num_inference_steps = 20
denoising_strength = 0.8
k = num_timesteps // original_inference_steps
original_timesteps = np.asarray(list(range(1, int(original_inference_steps * denoising_strength) + 1))) * k - 1
original_timesteps = original_timesteps[::-1].copy()
inference_indices = np.linspace(0, len(original_timesteps), num=num_inference_steps, endpoint=False)
inference_indices = np.floor(inference_indices).astype(np.int64)
timesteps = original_timesteps[inference_indices]
len(timesteps), timesteps

(20, array([799, 759, 719, 679, 639, 599, 559, 519, 479, 439, 399, 359, 319, 279, 239, 199, 159, 119, 79, 39]))

And below is my simpler implementation that would also work better with other schedulers where original_timesteps is not relevant.

import numpy as np

num_timesteps = 1000
num_inference_steps = 20
denoising_strength = 0.82
k = num_timesteps * denoising_strength // num_inference_steps
timesteps = np.asarray(list(range(0, num_inference_steps))) * k + 1
timesteps = timesteps[::-1]
len(timesteps), timesteps

(20, array([780., 739., 698., 657., 616., 575., 534., 493., 452., 411., 370., 329., 288., 247., 206., 165., 124., 83., 42., 1.]))

Most importatnt points here are:

  • The current implementation makes selecting a desired number of inference steps very difficult, in some cases impossible, which alter the output significantly especially with schedulers that require less steps of denoising. This is particularly relevant now with the popularity of DPMSolver schedulers, LCM, Lightning, Turbo models in the img2img context
  • It also prevents us from selecting a desired starting timestep because it always applies the strength to a preselected timestep array.

okaris avatar Apr 15 '24 07:04 okaris

Also because I like graphs here's the result of a strength sweep for different timesteps. It shows which timesteps are used and how much. lcm-compare

okaris avatar Apr 15 '24 11:04 okaris

Thanks for the nice discussion! first of all, it is not a bug as it is the intended behavior and we followed the original implementation

cc @asomoza here - let me know what you think?

yiyixuxu avatar Apr 15 '24 16:04 yiyixuxu

When you take into account usability, what @okaris is proposing is the way I thought it should work the first time I used the image-to-image pipeline.

Both comfyui and automatic1111 do this. It doesn’t matter what strength you choose, they always do the steps you put on top of it. Probably, most people who are used to user interfaces expect this behavior.

It’s not intuitive when a user sets a certain number of steps, but the generation process doesn’t actually follow this input. As a result, you end up having to calculate yourself what strength and steps you need to pass to actually get the same result.

asomoza avatar Apr 15 '24 18:04 asomoza

@asomoza + if you check the plot I shared, you are also limited in the actual timesteps you can use (out of the available 100) no matter what step/strength combination you choose, especially at lower inference steps.

okaris avatar Apr 15 '24 18:04 okaris

Oh yeah I saw that and I agree but most people use 25-30 or even 50 steps when they want quality generations and I don't think it's that noticeable there. Because of that I based my opinion on the UX perspective only.

But it just came to me that this will also help those real-time generation apps that use LCM, have you tested this with lighting models?

asomoza avatar Apr 15 '24 18:04 asomoza

@asomoza Most XL models and in particular the Lightning models behave rather strangely with the img2img pipelines, which needs a separate investigation but the timestep issue is still very apparent. Below is a comparison of proposed fix and original with ByteDance Lightning 4 Step Unet, strength sweep 0 -> 1

Diffusers v0.27.2 lightning-4step-org

Proposed lightning-4step-fixed

okaris avatar Apr 15 '24 20:04 okaris

Maybe this is related to the most recently reply ^ but we have to be quite careful with evaluating strength-based denoising (something img2img implements) on "fast" models. I'm not (yet) familiar with the training scheme for Lightning but at least for the original LCM paper the model is trained on a very specific set of timestep indices (e.g. 4-6 of them, evenly spaced in the interval [0,T]). In that case it's not super clear to me how the model behaves for timesteps outside that range.

It would be easier to just start off with evaluating how the proposed fix vs original implementation performs on standard diffusion models first.

christopher-beckham avatar Apr 16 '24 16:04 christopher-beckham

@christopher-beckham if you refer to my initial messages you can see that the LCMScheduler already manages this better than the override in the pipeline. Afaik LCM is distilled with 50 DDIM steps which means it has seen every step that is a multiple of 20. With the current diffusers implementation most these steps that the models were trained for are never used.

The proposed solution works also on standard diffusion models however as you can see on the graph ago the effect is much smaller when more timesteps are used to denoise making the impact negligable.

okaris avatar Apr 16 '24 17:04 okaris

I noticed in your implementation [1] tends to be the last timestep, since I'm indexing from zero that basically corresponds to assuming we are at x_{t=1} and want to denoise to x_0 (i.e. no noise). I'm just wondering if that is ideal since it's so close to 0 that we're effectively doing num_inference_steps-1 steps (because the last denoising step is at t=1).

I wrote some code to plot our schedules. My implementation (below as chris) differs slightly from the one I wrote earlier in this thread:

import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np

def lcm(num_timesteps, num_inference_steps, denoising_strength, original_inference_steps=50): 
    k = num_timesteps // original_inference_steps
    original_timesteps = np.asarray(list(range(1, int(original_inference_steps * denoising_strength) + 1))) * k - 1
    original_timesteps = original_timesteps[::-1].copy()
    inference_indices = np.linspace(0, len(original_timesteps), num=num_inference_steps, endpoint=False)
    inference_indices = np.floor(inference_indices).astype(np.int64)
    timesteps = original_timesteps[inference_indices]
    return timesteps

def omer_v2(num_timesteps, num_inference_steps, denoising_strength):
    k = num_timesteps * denoising_strength // num_inference_steps
    timesteps = np.asarray(list(range(0, num_inference_steps))) * k # + 1
    timesteps = timesteps[::-1]
    return timesteps

def chris(num_timesteps, num_inference_steps, denoising_strength):
    timesteps = np.linspace(0, int(denoising_strength*(num_timesteps-1)), num_inference_steps+1)
    return timesteps[::-1][0:-1].astype(np.int32)

Things to note:

  • I removed the + 1 from your implementation in omer_v2 since I wanted to keep everything zero indexed. Hope I interpreted your code correctly.
  • My code (chris) differs slightly from my earlier reply a few days ago.
  • lcm defaults original_inference_steps to 50.
  • Here, all timestep indices are zero indexed, so timestep 0 in code is really timestep 1 mathematically (the smallest possible noise level). By that same token, timestep T-1 in code is really timestep T mathematically.

Here are some example generations:

out_lcm=lcm(1000, 2, 0.8)
out_omer_v2=omer_v2(1000, 2, 0.8)
out_chris=chris(1000, 2, 0.8)
plt.plot(out_lcm)
plt.plot(out_omer_v2)
plt.plot(out_chris)
plt.title("T=1000, 2 steps, 0.8 strength")
plt.legend(["lcm {}".format(out_lcm), "omer {}".format(out_omer_v2), "chris {}".format(out_chris)])
plt.grid()
image

For 3 steps:

out_lcm=lcm(1000, 3, 0.8)
out_omer_v2=omer_v2(1000, 3, 0.8)
out_chris=chris(1000, 3, 0.8)
plt.plot(out_lcm)
plt.plot(out_omer_v2)
plt.plot(out_chris)
plt.title("T=1000, 3 steps, 0.8 strength")
plt.legend(["lcm {}".format(out_lcm), "omer {}".format(out_omer_v2), "chris {}".format(out_chris)])
plt.grid()
image

Does that look about right to you? Let me know if I missed something.

christopher-beckham avatar Apr 16 '24 18:04 christopher-beckham

That looks right. My naive examples were just to demonstrate the desired output also for models that work with more timesteps. I think the LCM implementation is in point. Your implementation (and LCM) might not work well with other models because they usually finish with timestep 1 or zero terminal snr, which is what I tried to cover.

okaris avatar Apr 16 '24 18:04 okaris

@yiyixuxu my priority need is the LCM which already has a different strategy for strength implemented in LCMScheduler. To avoid the complexity here, does it make sense for me to create a community pipeline for LCM img2img and submit a PR?

okaris avatar Apr 18 '24 06:04 okaris

@okaris community pipeline sounds good!

yiyixuxu avatar Apr 19 '24 05:04 yiyixuxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 13 '24 15:05 github-actions[bot]

have you experimented with models that use non-linear noise schedules such as CosXL or DeepFloyd, both of which use cosine but with deepfloyd's differing in that it's not cosine-continuous but square cosine with capped values (squaredcos_cap_v2)

@asomoza Most XL models and in particular the Lightning models behave rather strangely with the img2img pipelines, which needs a separate investigation but the timestep issue is still very apparent. Below is a comparison of proposed fix and original with ByteDance Lightning 4 Step Unet, strength sweep 0 -> 1

Diffusers v0.27.2 lightning-4step-org

Proposed lightning-4step-fixed

is the proposed schedule at 0.0 strength still completely changing the outputs? i might not be understanding the sample clearly

bghira avatar May 18 '24 18:05 bghira

Closing this issue because of inactivity. Feel free to reopen.

sayakpaul avatar Jun 29 '24 13:06 sayakpaul

@asomoza Most XL models and in particular the Lightning models behave rather strangely with the img2img pipelines, which needs a separate investigation but the timestep issue is still very apparent. Below is a comparison of proposed fix and original with ByteDance Lightning 4 Step Unet, strength sweep 0 -> 1

Diffusers v0.27.2 lightning-4step-org

Proposed lightning-4step-fixed

@okaris could you please share your code to reproduce the results with SDXL-Lightning that you provided? I am now using DPM Single Step and cannot reproduce results from ComfyUI.

VladAndronik avatar Oct 15 '24 15:10 VladAndronik