Stand-Alone-Self-Attention
Stand-Alone-Self-Attention copied to clipboard
The wrong imp of the inner-product operation
In Equation 2 of the paper, the query and the key are fed into inner-product operation, instead of point multiplication.
So the follow line
https://github.com/leaderj1001/Stand-Alone-Self-Attention/blob/e0a168ef8d4a7b93ae706a7d7c68b982e112821e/attention.py#L48
should be
out = (q_out * k_out).sum(dim=2)
I found the same problem. It seems the implementation in the code is equivalent to having #attention heads = #embed dimensions.
@XiaLiPKU How would that modify line 49 and 50?
@20171130 That was also my first opinion, but then there is an inconsistency with "groups" definition (to replicate the "attention heads") throughout the paper & the code.
Anyway, your alternative implementation helped me to understand the general concepts: https://github.com/20171130/AttentionLite/blob/master/model.py