Machine-Learning-Collection
Machine-Learning-Collection copied to clipboard
attention = torch.softmax(energy / (self.embed_size ** (1 / 2)), dim=3)
should be attention = torch.softmax(energy / (self.head_dim ** (1 / 2)), dim=3)