Sergey Shvets

Results 1 issues of Sergey Shvets

Propose adding an additional assert statement in the `MaskedSelfAttention` class to verify that the number of attention heads matches the dim size. Otherwise, if `self.dim` is not divisible by `self.num_heads`...