diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Analysis of Classifier-Free Guidance Weight Schedulers

Open rootonchair opened this issue 1 year ago • 7 comments

Model/Pipeline/Scheduler description

The paper's authors performs an analysis and proposes 1 line change in order to make Classifier-Free Guidance looks better cfg

I personally run some test to confirm

SD1.5 DDIM scheduler, 50 steps, "a photograph of an astronaut riding a horse", seed: 1024 guidance scale: 7.5 static sd_default

linear (proposed) sd_default_linear

guidance scale: 14.0 static sd_org

linear (proposed) sd_linear

Open source status

  • [ ] The model implementation is available.
  • [ ] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Paper: https://arxiv.org/abs/2404.13040

rootonchair avatar Apr 23 '24 18:04 rootonchair

Update SDXL result guidance scale: 14.0 static sdxl_org linear sdxl_org_linear

rootonchair avatar Apr 23 '24 19:04 rootonchair

cc: @yiyixuxu and @asomoza for visibility.

DN6 avatar Apr 24 '24 06:04 DN6

what a neat way to make use of the knowledge already in the model!

bghira avatar Apr 26 '24 00:04 bghira

cc @asomoza can we make a callback for this?

yiyixuxu avatar Apr 26 '24 00:04 yiyixuxu

yes but there's a lot of techniques about manipulating the CFG, most of them without papers, I added the cutout one because I know it's really popular, makes the generations faster, and as a kind of example on how to manipulate the CFG

Maybe we should let the community add these ones later on? Other more popular ones are automatic cfg and Dynamic Thresholding

asomoza avatar Apr 26 '24 06:04 asomoza

Hi, interested on your work. Is there an explanation of WHY this phenomena happens?

YunhoKim21 avatar May 16 '24 11:05 YunhoKim21

i'm not an expert on this, but on a cursory glance it seems to be basing the strength of guidance by the position in the timestep schedule. this also likely works because there's two types of attention being used by the model, with earlier timesteps being cross-attn (heavily relying on text conditional input) and later timesteps being self-attn (practically ignoring the prompt)

bghira avatar May 16 '24 11:05 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]