stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Delete xformers attnblock
In various threads, there's been reports of xformers degrading quality. After some debugging, I narrowed it down to xformers_attnblock_forward.
I failed at fixing this and believe it's probably impossible to fix. I think it's some issue related to the strides of q, k ,v
. The CompVis code to rearrange tensors for this function leads to uneven stride sizes and that makes it incompatible with memory_efficient_attention
.
Using legacy attnblock_forward fixes the quality loss issue. %5 slower on my 4090.
cc. @danthe3rd Do you have any ideas about how we could properly implement xformers for this function?
Hi @C43H66N12O12S2 - thanks for the heads-up. Do you have some pointers to share on these "quality degradations" - is it something you have seen yourself? Do you have a way for me to reproduce the problem? Can you also share the version of xFormers you are using with "python -m xformers.info"?
Cc @fmassa
@danthe3rd The symptoms are washed out colors, and increased image noise. Some images suffer more than others.
Here is one example:
With xformers_attnblock_forward:
Without:
or,
With:
Without:
(ignore my botched snipping work and focus on the crystal(?).)
To reproduce you could use this repository and generate a image at the same seed both with --xformers
COMMANDLINE_ARG
and without.
WARNING:root:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.14.dev
memory_efficient_attention.flshatt: available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass: available
memory_efficient_attention.small_k: available
is_triton_available: False
is_functorch_available: False
pytorch.version: 1.12.1+cu116
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4090
As another example, a user reported this (left side is without xformers_attnblock_forward):
To be clear, xformers as a whole is not responsible for this issue. My images were using xformers_attention_forward
. It's only our usage of xformers in xformers_attnblock_forward
that causes the issue. Both functions inside modules/sd_hijack_optimizations.py
I believe that if you messed up something with the dimensions, there would be much bigger differences right? (like image completely dark, or random)
Looking at the reference function cross_attention_attnblock_forward
, it looks like the output of self.q(_h)
has shape [b, c, h, w]
, where (I assume from reading the code), c
is the embedding size per head, h
/w
refer to the image height/width (?) - so h*w
is the "sequence length".
In that case, I would do something like this:
def xformers_attnblock_forward(self, x):
try:
h_ = x
h_ = self.norm(h_)
q = self.q(h_)
k = self.k(h_)
v = self.v(h_)
b, c, h, w = q.shape
q, k, v = map(lambda t: rearrange(t, 'b c h w -> b (h w) c'), (q, k, v))
out = xformers.ops.memory_efficient_attention(q, k, v)
# XXX: Not sure what the output format is...
out = rearrange(out, 'b (h w) c -> b c h w', h=h)
out = self.proj_out(out)
return x + out
except NotImplementedError:
return cross_attention_attnblock_forward(self, x)
That, uh, works, I think. I actually had wrote something extremely similar, but saw no improvement. Your snippet works.
Thank you :)
Oh glad to see it working! I still don't understand how it could generate "good looking" images before while being entirely wrong lol
These reshapes will however incur an additional cost (in terms of compute), so it might be worth to evaluate xformer's speedup again.
Performance seems fine. Maybe 0.5% slower.
I'm not sure either. I think the AttnBlock forward is the step where the image is upscaled from latent space, so any error should've been exaggerated.
I personally choose to chalk it up to xformers' competency :)
xformers had issue where sd could generate slightly different results on same seed, maybe it was related to this thing
Yeah, it is. That's the exact issue this PR will fix, actually.
Tried it and got RuntimeError: query: last dimension must be contiguous
in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.
Tried it and got
RuntimeError: query: last dimension must be contiguous
in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.
This is no longer required with recent versions of xFormers (and has better performance), so the error should go away if you update it
Plus, in newer versions, setting them to contiguous on the first assignment will actually produce the error @hentailord85ez got.
Tried it and got
RuntimeError: query: last dimension must be contiguous
in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.This is no longer required with recent versions of xFormers (and has better performance), so the error should go away if you update it
Oops my bad. Forget what I said. The newer versions of xFormers still require the last dimension to be contiguous. There should be a contiguous()
call indeed.
Hm, the code as-is works on my machine, though.
Yes, I've just checked and I've got the 14.dev version, but still have the error.
WARNING:root:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.14.dev
memory_efficient_attention.flshatt: available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass: available
memory_efficient_attention.small_k: available
is_triton_available: False
is_functorch_available: False
pytorch.version: 1.12.1+cu113
pytorch.cuda: available
gpu.compute_capability: 8.6
gpu.name: NVIDIA GeForce RTX 3050 Laptop GPU
This fix didn't seem to do anything on my setup. GTX 1080 CUDA 11.6 PyTorch cu116 xFormers 0.0.14.dev0, compiled for CUDA 6.1 (GTX 1080) with CUDA 11.6
hi @C43H66N12O12S2
how we rewrite this line q, k, v = map(lambda t: rearrange(t, 'b c h w -> b (h w) c'), (q, k, v))
without einops rearrange() but using permute()
like this: (from https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py)
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q, k, v))
to
q, k, v = map(lambda t: t.reshape(t.shape[0], t.shape[1], self.heads, t.shape[2] // self.heads).permute(0, 2, 1, 3).reshape(t.shape[0] * self.heads, t.shape[1], t.shape[2] // self.heads), (q, k, v))
This fix is working for me and it's great. I had thought the increased "noisiness" I was noticing this week on a certain landscape prompt I like to use was placebo, but now I know I was not imagining it. This commit completely resolves that issue, thanks! Also produces no noticable slowdown on my 3080.
@camenduru why?
@C43H66N12O12S2 not working on colab t4 😭 only works like this but 75 token https://github.com/AUTOMATIC1111/stable-diffusion-webui/compare/master...camenduru:stable-diffusion-webui:colab
Uh, those changes in your commit modify xformers_attention_forward, this PR is about xformers_attnblock_forward. I also don't understand why it wouldn't work, every single attention in this repo uses rearrange
.
@C43H66N12O12S2 rearrange with xformers not working idk why, yes I chanced xformers_attention_forward and start working but I want to change also xformers_attnblock_forward too but I don't know how if you know it will be super cool