SparseBEV
SparseBEV copied to clipboard
尺度自适应自我注意力机制
作者您好:
Efficient implementation equivalent to the following:
attn_mask = torch.ones(L, S, dtype=torch.bool).tril(diagonal=0) if is_causal else attn_mask
attn_mask = attn_mask.masked_fill(not attn_mask, -float('inf')) if attn_mask.dtype==torch.bool else attn_mask
attn_weight = torch.softmax((Q @ K.transpose(-2, -1) / math.sqrt(Q.size(-1))) + attn_mask, dim=-1)
attn_weight = torch.dropout(attn_weight, dropout_p)
return attn_weight @ V
根据torch源码,在计算attn weight 时attn_mask是与其直接相加的
tau = tau.permute(0, 2, 1)
attn_mask = dist[:, None, :, :] * tau[..., None] # [B, 8, Q, Q]
if pre_attn_mask is not None: # for query denoising
attn_mask[:, :, pre_attn_mask] = float('-inf')
attn_mask = attn_mask.flatten(0, 1) # [Bx8, Q, Q]
根据Sparse BEV源码,attn_mask对应$\tau D$
但这和论文中的公式不符合,论文公式对应减法,上述的代码对应加法,请问是不是我哪里理解错了?