diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

cads support

Open loboere opened this issue 1 year ago • 5 comments

https://arxiv.org/abs/2310.17347

loboere avatar Jan 25 '24 05:01 loboere

@loboere could you share some of the results/benefits from this method?

DN6 avatar Jan 25 '24 13:01 DN6

CADS is a technique that greatly increases diversity of generated images by adding scheduled noise to the conditioning at inference time.

image

Here's an implementation of it as an AUTOMATIC1111 extension: https://github.com/v0xie/sd-webui-cads

apolinario avatar Feb 08 '24 14:02 apolinario

is it a new scheduler?

yiyixuxu avatar Feb 08 '24 17:02 yiyixuxu

is it a new scheduler?

I'm glad to see that you are interested in our paper! CADS is not technically a new scheduler but a technique that can be used on top of common diffusion schedulers (like DDPM or DDIM) to increase the diversity of the outputs.

Msadat97 avatar Feb 13 '24 19:02 Msadat97

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 09 '24 15:03 github-actions[bot]

@Msadat97 I just finished reading the paper a few hours ago and I understand how to implement it. The A1111 results look great! Thanks for your awesome work.

To summarize the process:

  • Noise is added to conditioning vector every step of the inference process
  • As inference progresses towards completion, lesser and lesser noise is added
  • The resulting outputs are much more diverse in comparison to static guidance scale and dynamic guidance scale as experimented on DDPM

I believe this is a great way to improve diversity, but noticed while experimenting that prompt-following or adherence to subject image is somewhat lost (due to the noisy signal addition in conditioning). You highlight this for class-conditional imagenet generation as a minor loss in accuracy. For general SD though, it seems to be having a slightly higher impact.

@yiyixuxu May I work on this? I have a feeling that the functionality could be implemented with callback_on_step_end combined with a few minor changes adding support for a before-inference call to callback_on_step_end as done here. Not sure though, will need to try sometime this weekend.

a-r-r-o-w avatar Mar 20 '24 20:03 a-r-r-o-w

The diversity increase might be more obvious for models that are trained on smaller datasets compared to SD. What we mainly observed was that for example if the prompt is kind of leading to a repetitive result (like the astronaut on the moon), then CADS can alleviate this issue quite a bit. Also for prompt adherence, I think one has somehow control over that by shifting the cads noise schedule more toward the beginning of the sampling (i.e. choosing tau_1 closer to 1). In the extreme case of tau_1=1, we get the default sampling back. We also noticed that deterministic samplers generally get more diverse with less noise.

I'd be happy to answer any other question you might have regarding the paper. Thanks for taking the initiative.

Msadat97 avatar Mar 20 '24 21:03 Msadat97

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 14 '24 15:04 github-actions[bot]