native_guidance_scale parameter for LCMs in StableDiffusionXLPipeline

Open ivanprado opened this issue 5 months ago • 16 comments

What does this PR do?

Distilled LCMs don't perform regular class-free guidance. Instead, they pass the guidance_scale as conditioning to the U-Net. This is cool because it reduces the computing required by 2x, given that the negative prediction is not required.

But in practice, we have seen that being able to also perform regular classifier-free guidance in addition to the conditional guidance_scale can be useful:

It allows to use negative prompt again.
It provides better quality/prompt adherence in some cases.

This PR introduces a new parameter, native_guidance_scale, that can be used with distilled LCM models to perform regular classifier-free guidance.

An example

Code to test the change in Text2Image pipeline:

from diffusers import DiffusionPipeline, UNet2DConditionModel, LCMScheduler
import torch

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", unet=unet, torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

prompt = "High altitude snowy mountains"

generator = torch.manual_seed(0)
image = pipe(
    prompt=prompt, num_inference_steps=6, generator=generator, guidance_scale=8.0
).images[0]

generator = torch.manual_seed(0)
image_native_cfg = pipe(
    prompt=prompt, num_inference_steps=6, generator=generator, guidance_scale=8, native_guidance_scale=1.5
).images[0]

Resultant images:

Screenshot 2024-02-16 at 10 32 56 Screenshot 2024-02-16 at 10 33 06

Code to test the change in img2img pipeline:

import torch
from diffusers import AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
from diffusers.utils import make_image_grid, load_image

unet = UNet2DConditionModel.from_pretrained(
    "latent-consistency/lcm-sdxl",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True, unet=unet
)
pipeline.enable_model_cpu_offload()
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)

# prepare image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-sdxl-init.png"
init_image = load_image(url)

prompt = "A painting of an astronaut in a jungle, cold color palette."

# pass prompt and image to pipeline
generator = torch.manual_seed(33)
image = pipeline(prompt, image=init_image, strength=0.5, num_inference_steps=6, generator=generator).images[0]
generator = torch.manual_seed(33)
image_new = pipeline(prompt, image=init_image, strength=0.5, num_inference_steps=6, generator=generator, native_guidance_scale=2.5).images[0]
make_image_grid([init_image, image, image_new], rows=3, cols=1)

Resultant images:

@patrickvonplaten and @sayakpaul

Feb 16 '24 09:02 ivanprado

Cc: @patil-suraj

Feb 16 '24 10:02 sayakpaul

@yiyixuxu Unfortunately, it is not possible to implement it using callback_on_the_step_end. The reasons are:

The property do_classifier_free_guidance returns False when a distilled LCM model is used. But we need it to return True for this feature. We could use a new parameter to force this (e.g. force_classifier_free_guidance). It would not be LCM specific.
Even after introducing the new parameter, a solution using callback_on_the_step_end won't allow modifying the guidance_scale for the first step, because the callback is executed at the end of the step. Therefore, it is impossible to modify the first step guidance scale, and the first step is very important.

I understand why you don't like adding a new parameter that works just for LCMs, but I don't see good alternative solutions and the proposed one here is harmless. What do you think?

Feb 27 '24 15:02 ivanprado

@ivanprado we are happy to extend our callback functionalities to make this work!

Feb 27 '24 16:02 yiyixuxu

cc @vladmandic @asomoza @DN6 here as well

should we introduce a callback_on_step_begin ?

Feb 27 '24 21:02 yiyixuxu

I understand why you don't like adding a new parameter that works just for LCMs, but I don't see good alternative solutions and the proposed one here is harmless. What do you think?

agree it is harmless, but if we follow such a principle to add parameters for any small use cases, we will quickly get overwhelmed - we introduced callback loops for this exact reason. note that we have some parameters such as guidance_rescale that was introduced before the callback parameter and we would not have to add it otherwise.

It will be much easier for users to tweak our pipelines too, without having to submit PRs

Feb 27 '24 21:02 yiyixuxu

callback_on_step_begin

no issues with that on my side - and having more callbacks just gives more flexibility without too much complexities for normal user as they don't have to be used. i'd say bigger issue is on your side - callbacks should be as uniform as possible between different pipelines, so while introducing a new one is fine, its less-than-ideal if its present in just one pipeline and updating all pipelines is probably not something you're looking forward to.

Feb 27 '24 22:02 vladmandic

Maybe I'm missing something but in this case wouldn't it be better to just remove the check self.unet.config.time_cond_proj_dim is None so people can choose if they want to use it? Isn't this the same for the turbo and lighting models? People know that they have to keep the CFG to 1.0 to get the speed but they still can choose over 1.0 if they want better quality or control.

I think this can be resolved with just documentation, in my code I don't have that check for the same reason.

Feb 27 '24 22:02 asomoza

Maybe I'm missing something but in this case wouldn't it be better to just remove the check self.unet.config.time_cond_proj_dim is None so people can choose if they want to use it?

it will change the expected behavior and backward breaking then

Feb 27 '24 23:02 yiyixuxu

it will change the expected behavior and backward breaking then

In that case I don't see any alternative than adding the callback, I agree with @vladmandic that more callbacks add more flexibility but personally I don't see a real use case for callback_on_step_begin yet.

@a-r-r-o-w did an experiment with this in https://github.com/huggingface/diffusers/issues/7038#issuecomment-1960700209 though.

Feb 28 '24 01:02 asomoza

Something to keep in mind. Even if the callback_on_step_begin is introduced, two other changes are still required so that the goal feature in the PR is possible:

Add a new parameter force_classifier_free_guidance to overpass the check self.unet.config.time_cond_proj_dim is None
Fix the bug when cfg is applied for models with time_cond_proj_dim

Feb 28 '24 09:02 ivanprado

I'm working on changing the code. Particularly, I propose to add the following parameters:

            callback_on_step_end_also_before_start (`bool`, *optional*, defaults to False):
                If `True`, the `callback_on_step_end` function will also be called before the start of the inference.
                The callback will receive -1 as step to identify this particular case, in which some tensors
                might not be available.
            force_classifier_free_guidance (`bool`, *optional*, defaults to False):
                Forces the execution of classifier free guidance, even if the guidance scale is below 1 or the model
                is a LCM model.

Early feedback is welcome.

Mar 11 '24 16:03 ivanprado

@yiyixuxu this is ready for a re-review. I've removed the old parameters, and introduced the following ones that are also backward compatible:

  callback_on_step_end_also_at_init (`bool`, *optional*, defaults to False):
      If `True`, the `callback_on_step_end` function will also be called before the start of the inference.
      The callback will receive -1 as step to identify this particular case, in which some tensors
      might not be available.
  force_classifier_free_guidance (`bool`, *optional*, defaults to False):
      Forces the execution of classifier free guidance, even if the guidance scale is below 1 or the model
      is a LCM model.

The test cases has been modified accordingly.

Mar 12 '24 12:03 ivanprado

@ivanprado Great work with this! Just curious: can't the functionality of callback_on_step_end_also_at_init be done with callback_on_step_begin. You can do some pre-inference stuff by some conditional logic when i==0. I mention it because if we push for begin callback, we can integrate things like differential diffusion quite easily across all inpaint pipelines, which also requires some things to be setup before the inference loop starts. WDYT?

Mar 12 '24 13:03 a-r-r-o-w

@a-r-r-o-w note that you could obtain almost the same effect that you get with callback_on_step_begin by using a callback_on_step_end with callback_on_step_end_also_at_init but ignoring the last invocation of the callback. For example:

steps = 10

def callback_on_step_end(pipe, step_index, timestep, callback_kwargs):
  # Your implementation here
  ...

def callback_on_step_end(pipe, step_index, timestep, callback_kwargs):
  nonlocal steps
  timesteps = 
  step_index += 1
  if step_index != steps:
    callback_kwargs = callback_on_step_begin(pipe, step_index, timestep, callback_kwargs)
  return callback_kwargs

result = pipe(
    prompt=prompt, num_inference_steps=steps, 
    callback_on_step_end=callback_on_step_end,
    callback_on_step_end_also_at_init=True,
).images[0]

The only problem I see is with the timestep, which will be only right for the first step. The rest will have the timestep of the previous step.

But if we would have access to the timesteps array in the callback, this wouldn't be a problem. Something else you would miss?

Mar 12 '24 15:03 ivanprado

Hi @yiyixuxu I've already implemented the suggested changes. It would be nice if you can have a look.

Apr 03 '24 08:04 ivanprado

good candidate for #7761

Apr 25 '24 14:04 bghira

diffusers diffusers copied to clipboard

native_guidance_scale parameter for LCMs in StableDiffusionXLPipeline

What does this PR do?

An example

diffusers
diffusers copied to clipboard