Attention temperature
What does this PR do?
In Diffusion models such as stabilityai/stable-diffusion-xl-base-1.0 generating images at resolutions far below the base training resolution gives bad results:
Excellent work in Extraltodeus/Stable-Diffusion-temperature-settings shows improvement is possible in this area.
For simplicity this PR initially modifies only AttnProcessor2_0, changes can be replicated to other processors once reviewed.
temperature can be passed from cross_attention_kwargs and scale is calculated as attn.scale / temperature.
cross_attention_kwargs = {'temperature': 1.5}
pipe(prompt, cross_attention_kwargs=cross_attention_kwargs, height=512, width=512).images[0]
At temperature=1 the calculated scale matches attn.scale, note some TODO comments mention adding support for attn.scale. This value does fix lower resolution generation but higher temperature values as above are better.
| Temperature | Scale |
|---|---|
| 0.1 | 1.25 |
| 0.2 | 0.625 |
| 0.3 | 0.416667 |
| 0.4 | 0.3125 |
| 0.5 | 0.25 |
| 0.6 | 0.208333 |
| 0.7 | 0.178571 |
| 0.8 | 0.15625 |
| 0.9 | 0.138889 |
| 1 | 0.125 |
| 1.1 | 0.113636 |
| 1.2 | 0.104167 |
| 1.3 | 0.096154 |
| 1.4 | 0.089286 |
| 1.5 | 0.083333 |
| 1.6 | 0.078125 |
| 1.7 | 0.073529 |
| 1.8 | 0.069444 |
| 1.9 | 0.065789 |
| 2 | 0.0625 |
| 2.1 | 0.059524 |
| 2.2 | 0.056818 |
| 2.3 | 0.054348 |
| 2.4 | 0.052083 |
| 2.5 | 0.05 |
cc @asomoza do we need this?
Honestly I don't see the use case for this, maybe if someone can give us a reason why is there a need to make something like SDXL to generate low resolution images instead of just switching to SD 1.5.
Even if there's a reason, there's already working solutions, for example for higher resolutions there's HiDiffusion and for lower resolutions there's ResAdapter.
Honestly it's incredibly shortsighted if you can't see the use case for low resolution images from XL, resource usage has been the main friction in the switch over.
Plus, this simply extends planned functionality by @patrickvonplaten ~15 months ago, there would need to be an argument added to control whether to use attn.scale anyway so why not make it able to adjust the scale.
Feel free to close.
There no need to start calling people adjectives just because they don't share the same opinion as you. There's also no need to close this, still waiting for people to comment and discuss in a respectful way.
Just in case you don't know, there seems to be SD3 version coming with 2B parameters and that generates in 512px just for the people that have low end GPUs so probably any effort in making SDXL to use lower VRAM wouldn't matter anymore.
We have the experience of the SDXL Turbo version that no one actually uses because people wants to generate 1024px images so they just use Lighting, HyperSD or Turbo merges.
There's also a recent discussion in reddit where people were actually sad/mad because the first release of SD3 is probably going to be the 512 one.
If there was a plan to do this, it shouldn't really need a reason to be added. Don't really know why it wasn't, maybe @yiyixuxu or @sayakpaul can comment on this.
I am well aware of the alleged SD3 release, soon or in two more weeks, which ever comes first. Whether SD3 generates 512 is irrelevant, everyone isn't going to suddenly switch to SD3, the overall requirements are still larger than SDXL due to the text encoder and the general consensus is "wait for finetunes" like XL anyway.
No one actually uses many features of Diffusers, it's literally part of the library's philosophy to be "tweakable", so why do you think scale of scaled_dot_product_attention is something users should not use.
If there was a plan to do this
Look at the comments in attention_processor.py. I shouldn't need to convince you to support something that was already planned to be supported. The only question is whether this can be supported yet considering the content of the comment itself - "add support for attn.scale when we move to Torch 2.1"
My opinion was exclusively related to the part where this was added to make SDXL able to generate in lower resolutions which are the first two sentences of this PR. We both gave our opinions and hopefully maybe more people will.
I don't really need convincing about the attention scale or the temperature. I for one really like to have more tweakable parameters in everything but we can't just add every little tweak without a solid reason.
The only question is whether this can be supported yet considering the content of the comment itself - "add support for attn.scale when we move to Torch 2.1"
Indeed, so let's wait for the people that knows this.
Hi @hlky
The attention class is a low-level abstraction, it is really not feasible for us to introduce new arguments for each specific use case like this one. So unfortunately we will not be able to support the temperature argument
However, our library is designed to be "tweakable". We have a set_attn_processor method so that you can write your own custom attention processor without having to make any changes to the source code https://huggingface.co/docs/diffusers/v0.28.0/en/api/models/unet2d-cond#diffusers.UNet2DConditionModel.set_attn_processor - maybe we can write a doc page about this and use this as a simple example? cc @stevhliu
About Patrick's comment on the support attn.scale. Sorry, I'm not aware of that plan, but I will look into it. Feel free to open a separate issue if you're interested in helping:)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.