Stand-Alone-Self-Attention The wrong imp of the inner-product operation

The wrong imp of the inner-product operation

Open XiaLiPKU opened this issue 5 years ago • 3 comments

In Equation 2 of the paper, the query and the key are fed into inner-product operation, instead of point multiplication.

So the follow line https://github.com/leaderj1001/Stand-Alone-Self-Attention/blob/e0a168ef8d4a7b93ae706a7d7c68b982e112821e/attention.py#L48 should be out = (q_out * k_out).sum(dim=2)

Oct 09 '19 07:10 XiaLiPKU

I found the same problem. It seems the implementation in the code is equivalent to having #attention heads = #embed dimensions.

Oct 10 '19 02:10 20171130

@XiaLiPKU How would that modify line 49 and 50?

Jan 03 '20 21:01 ifeherva

@20171130 That was also my first opinion, but then there is an inconsistency with "groups" definition (to replicate the "attention heads") throughout the paper & the code.

Anyway, your alternative implementation helped me to understand the general concepts: https://github.com/20171130/AttentionLite/blob/master/model.py

Mar 19 '20 09:03 canaltin

Stand-Alone-Self-Attention Stand-Alone-Self-Attention copied to clipboard

The wrong imp of the inner-product operation

Stand-Alone-Self-Attention
Stand-Alone-Self-Attention copied to clipboard