metaformer
metaformer copied to clipboard
Question regarding Equation 5 Random Mixing in the Paper
Should the matrix multiplication not be swapped?
$$ RandomMixing(X) = XW_R \Rightarrow RandomMixing(X) = W_RX $$
I think the dimension don`t work for the original equation because you are multiplying a $N \times C$ matrix with a $N \times N$ matrix. Secondly, in the source code I think you are actually multiplying the weight matrix with the input matrix.
class RandomMixing(nn.Module):
def __init__(self, num_tokens=196, **kwargs):
super().__init__()
self.random_matrix = nn.parameter.Parameter(
data=torch.softmax(torch.rand(num_tokens, num_tokens), dim=-1),
requires_grad=False)
def forward(self, x):
B, H, W, C = x.shape
x = x.reshape(B, H*W, C)
x = torch.einsum('mn, bnc -> bmc', self.random_matrix, x)
x = x.reshape(B, H, W, C)
return x