xformers
xformers copied to clipboard
Reproduce attention result by pytorch failed
❓ Questions and Help
I try to use torch to reproduce the result of memory_efficient_attention by following api docs.But didn't successful, It maybe cause from my negligence or bug?
Here is my test script, running with torch==2.1.2+cu121 and xformers==0.0.23.post1:
import torch
import torch.nn as nn
import xformers.ops
q = torch.rand(1, 2, 8, 4).cuda()
k = torch.rand(1, 2, 8, 4).cuda()
v = torch.rand(1, 2, 8, 4).cuda()
scale = 1 / q.shape[-1] ** 0.5
scores = torch.matmul(q * scale, k.transpose(-2, -1))
attn_weights_v1 = nn.functional.softmax(scores, dim=-1)
attn_weights_v1 = nn.functional.dropout(attn_weights_v1, p=0)
attn_weights_v1 = torch.matmul(attn_weights_v1, v)
attn_weights_v2 = xformers.ops.memory_efficient_attention(query=q, key=k, value=v, p=0)
assert (attn_weights_v1 == attn_weights_v2).any() # return False but expect True
Hi @asdfd2013
I don't see any bug from looking at your code, I expect attn_weights_v1
and attn_weights_v2
to be very close.
However they won't be identical, due to how numerical operations are performed.
Hi @asdfd2013 I don't see any bug from looking at your code, I expect
attn_weights_v1
andattn_weights_v2
to be very close. However they won't be identical, due to how numerical operations are performed.
@danthe3rd Hi, I think the query should be shape of (batch_size, seq_len, num_head, head_dim) when it is used as input of xformers.ops.memory_efficient_attention, while query should be shape of (batch_size, num_head, seq_len, head_dim) if it used for normal attention computation. But the provided code seems has the wrong shape for the normal attention computation. Is it right?