metaformer icon indicating copy to clipboard operation
metaformer copied to clipboard

Question regarding Equation 5 Random Mixing in the Paper

Open f-fuchs opened this issue 9 months ago • 1 comments

Should the matrix multiplication not be swapped?

$$ RandomMixing(X) = XW_R \Rightarrow RandomMixing(X) = W_RX $$

I think the dimension don`t work for the original equation because you are multiplying a $N \times C$ matrix with a $N \times N$ matrix. Secondly, in the source code I think you are actually multiplying the weight matrix with the input matrix.

class RandomMixing(nn.Module):
    def __init__(self, num_tokens=196, **kwargs):
        super().__init__()
        self.random_matrix = nn.parameter.Parameter(
            data=torch.softmax(torch.rand(num_tokens, num_tokens), dim=-1), 
            requires_grad=False)
    def forward(self, x):
        B, H, W, C = x.shape
        x = x.reshape(B, H*W, C)
        x = torch.einsum('mn, bnc -> bmc', self.random_matrix, x)
        x = x.reshape(B, H, W, C)
        return x

f-fuchs avatar May 23 '24 14:05 f-fuchs