stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Delete xformers attnblock

Open C43H66N12O12S2 opened this issue 1 year ago • 3 comments

In various threads, there's been reports of xformers degrading quality. After some debugging, I narrowed it down to xformers_attnblock_forward.

I failed at fixing this and believe it's probably impossible to fix. I think it's some issue related to the strides of q, k ,v. The CompVis code to rearrange tensors for this function leads to uneven stride sizes and that makes it incompatible with memory_efficient_attention.

Using legacy attnblock_forward fixes the quality loss issue. %5 slower on my 4090.

cc. @danthe3rd Do you have any ideas about how we could properly implement xformers for this function?

C43H66N12O12S2 avatar Oct 17 '22 16:10 C43H66N12O12S2

Hi @C43H66N12O12S2 - thanks for the heads-up. Do you have some pointers to share on these "quality degradations" - is it something you have seen yourself? Do you have a way for me to reproduce the problem? Can you also share the version of xFormers you are using with "python -m xformers.info"?

Cc @fmassa

danthe3rd avatar Oct 17 '22 17:10 danthe3rd

@danthe3rd The symptoms are washed out colors, and increased image noise. Some images suffer more than others. Here is one example: With xformers_attnblock_forward: 00058-2725260896 Without: 00059-2725260896 or, With: with Without: without (ignore my botched snipping work and focus on the crystal(?).)

To reproduce you could use this repository and generate a image at the same seed both with --xformers COMMANDLINE_ARG and without.

WARNING:root:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.14.dev
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
is_triton_available:                     False
is_functorch_available:                  False
pytorch.version:                         1.12.1+cu116
pytorch.cuda:                            available
gpu.compute_capability:                  8.9
gpu.name:                                NVIDIA GeForce RTX 4090

C43H66N12O12S2 avatar Oct 17 '22 17:10 C43H66N12O12S2

As another example, a user reported this (left side is without xformers_attnblock_forward): 1666020303271565

To be clear, xformers as a whole is not responsible for this issue. My images were using xformers_attention_forward. It's only our usage of xformers in xformers_attnblock_forward that causes the issue. Both functions inside modules/sd_hijack_optimizations.py

C43H66N12O12S2 avatar Oct 17 '22 17:10 C43H66N12O12S2

I believe that if you messed up something with the dimensions, there would be much bigger differences right? (like image completely dark, or random)

Looking at the reference function cross_attention_attnblock_forward, it looks like the output of self.q(_h) has shape [b, c, h, w], where (I assume from reading the code), c is the embedding size per head, h/w refer to the image height/width (?) - so h*w is the "sequence length". In that case, I would do something like this:

def xformers_attnblock_forward(self, x):
    try:
        h_ = x
        h_ = self.norm(h_)
        q = self.q(h_)
        k = self.k(h_)
        v = self.v(h_)
        b, c, h, w = q.shape
        q, k, v = map(lambda t: rearrange(t, 'b c h w -> b (h w) c'), (q, k, v))
        out = xformers.ops.memory_efficient_attention(q, k, v)
        # XXX: Not sure what the output format is...
        out = rearrange(out, 'b (h w) c -> b c h w', h=h)
        out = self.proj_out(out)
        return x + out
    except NotImplementedError:
        return cross_attention_attnblock_forward(self, x)

danthe3rd avatar Oct 17 '22 19:10 danthe3rd

That, uh, works, I think. I actually had wrote something extremely similar, but saw no improvement. Your snippet works. 00063-2725260896

Thank you :)

C43H66N12O12S2 avatar Oct 17 '22 19:10 C43H66N12O12S2

Oh glad to see it working! I still don't understand how it could generate "good looking" images before while being entirely wrong lol

danthe3rd avatar Oct 17 '22 19:10 danthe3rd

These reshapes will however incur an additional cost (in terms of compute), so it might be worth to evaluate xformer's speedup again.

danthe3rd avatar Oct 17 '22 19:10 danthe3rd

Performance seems fine. Maybe 0.5% slower.

I'm not sure either. I think the AttnBlock forward is the step where the image is upscaled from latent space, so any error should've been exaggerated.

I personally choose to chalk it up to xformers' competency :)

C43H66N12O12S2 avatar Oct 17 '22 19:10 C43H66N12O12S2

xformers had issue where sd could generate slightly different results on same seed, maybe it was related to this thing

x02Sylvie avatar Oct 17 '22 19:10 x02Sylvie

Yeah, it is. That's the exact issue this PR will fix, actually.

C43H66N12O12S2 avatar Oct 17 '22 19:10 C43H66N12O12S2

Tried it and got RuntimeError: query: last dimension must be contiguous in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.

hentailord85ez avatar Oct 17 '22 20:10 hentailord85ez

Tried it and got RuntimeError: query: last dimension must be contiguous in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.

This is no longer required with recent versions of xFormers (and has better performance), so the error should go away if you update it

danthe3rd avatar Oct 17 '22 20:10 danthe3rd

Plus, in newer versions, setting them to contiguous on the first assignment will actually produce the error @hentailord85ez got.

C43H66N12O12S2 avatar Oct 17 '22 20:10 C43H66N12O12S2

Tried it and got RuntimeError: query: last dimension must be contiguous in the call to xformers.memory_efficient_attention. Setting q, k, v to .contiguous() fixes it.

This is no longer required with recent versions of xFormers (and has better performance), so the error should go away if you update it

Oops my bad. Forget what I said. The newer versions of xFormers still require the last dimension to be contiguous. There should be a contiguous() call indeed.

danthe3rd avatar Oct 17 '22 20:10 danthe3rd

Hm, the code as-is works on my machine, though.

C43H66N12O12S2 avatar Oct 17 '22 21:10 C43H66N12O12S2

Yes, I've just checked and I've got the 14.dev version, but still have the error.

WARNING:root:A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
xFormers 0.0.14.dev
memory_efficient_attention.flshatt:      available - requires GPU with compute capability 7.5+
memory_efficient_attention.cutlass:      available
memory_efficient_attention.small_k:      available
is_triton_available:                     False
is_functorch_available:                  False
pytorch.version:                         1.12.1+cu113
pytorch.cuda:                            available
gpu.compute_capability:                  8.6
gpu.name:                                NVIDIA GeForce RTX 3050 Laptop GPU

hentailord85ez avatar Oct 17 '22 21:10 hentailord85ez

This fix didn't seem to do anything on my setup. GTX 1080 CUDA 11.6 PyTorch cu116 xFormers 0.0.14.dev0, compiled for CUDA 6.1 (GTX 1080) with CUDA 11.6

danielmm8888 avatar Oct 17 '22 23:10 danielmm8888

hi @C43H66N12O12S2 how we rewrite this line q, k, v = map(lambda t: rearrange(t, 'b c h w -> b (h w) c'), (q, k, v)) without einops rearrange() but using permute()

like this: (from https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py)

q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q, k, v)) to q, k, v = map(lambda t: t.reshape(t.shape[0], t.shape[1], self.heads, t.shape[2] // self.heads).permute(0, 2, 1, 3).reshape(t.shape[0] * self.heads, t.shape[1], t.shape[2] // self.heads), (q, k, v))

camenduru avatar Oct 18 '22 01:10 camenduru

This fix is working for me and it's great. I had thought the increased "noisiness" I was noticing this week on a certain landscape prompt I like to use was placebo, but now I know I was not imagining it. This commit completely resolves that issue, thanks! Also produces no noticable slowdown on my 3080.

clockworkwhale avatar Oct 18 '22 08:10 clockworkwhale

@camenduru why?

C43H66N12O12S2 avatar Oct 18 '22 08:10 C43H66N12O12S2

@C43H66N12O12S2 not working on colab t4 😭 only works like this but 75 token https://github.com/AUTOMATIC1111/stable-diffusion-webui/compare/master...camenduru:stable-diffusion-webui:colab

camenduru avatar Oct 18 '22 08:10 camenduru

Uh, those changes in your commit modify xformers_attention_forward, this PR is about xformers_attnblock_forward. I also don't understand why it wouldn't work, every single attention in this repo uses rearrange.

C43H66N12O12S2 avatar Oct 18 '22 09:10 C43H66N12O12S2

@C43H66N12O12S2 rearrange with xformers not working idk why, yes I chanced xformers_attention_forward and start working but I want to change also xformers_attnblock_forward too but I don't know how if you know it will be super cool

camenduru avatar Oct 18 '22 09:10 camenduru