Reproduce attention result by pytorch failed

Open asdfd2013 opened this issue 1 year ago • 2 comments

❓ Questions and Help

I try to use torch to reproduce the result of memory_efficient_attention by following api docs.But didn't successful, It maybe cause from my negligence or bug？

Here is my test script, running with torch==2.1.2+cu121 and xformers==0.0.23.post1:

import torch
import torch.nn as nn
import xformers.ops

q = torch.rand(1, 2, 8, 4).cuda()
k = torch.rand(1, 2, 8, 4).cuda()
v = torch.rand(1, 2, 8, 4).cuda()

scale = 1 / q.shape[-1] ** 0.5
scores = torch.matmul(q * scale, k.transpose(-2, -1))
attn_weights_v1 = nn.functional.softmax(scores, dim=-1)
attn_weights_v1 = nn.functional.dropout(attn_weights_v1, p=0)
attn_weights_v1 = torch.matmul(attn_weights_v1, v)

attn_weights_v2 = xformers.ops.memory_efficient_attention(query=q, key=k, value=v, p=0)

assert (attn_weights_v1 == attn_weights_v2).any() # return False but expect True

Jan 23 '24 10:01 asdfd2013

Hi @asdfd2013 I don't see any bug from looking at your code, I expect attn_weights_v1 and attn_weights_v2 to be very close. However they won't be identical, due to how numerical operations are performed.

Jan 23 '24 12:01 danthe3rd

Hi @asdfd2013 I don't see any bug from looking at your code, I expect attn_weights_v1 and attn_weights_v2 to be very close. However they won't be identical, due to how numerical operations are performed.

@danthe3rd Hi, I think the query should be shape of (batch_size, seq_len, num_head, head_dim) when it is used as input of xformers.ops.memory_efficient_attention, while query should be shape of (batch_size, num_head, seq_len, head_dim) if it used for normal attention computation. But the provided code seems has the wrong shape for the normal attention computation. Is it right?

Apr 25 '24 08:04 Gus-Guo

xformers xformers copied to clipboard

Reproduce attention result by pytorch failed

❓ Questions and Help

xformers
xformers copied to clipboard