Sergey Shvets
Results
1
issues of
Sergey Shvets
Propose adding an additional assert statement in the `MaskedSelfAttention` class to verify that the number of attention heads matches the dim size. Otherwise, if `self.dim` is not divisible by `self.num_heads`...