keras-self-attention
keras-self-attention copied to clipboard
Question about the SeqSelfAttention.
My question is: For the additive self-attention approach, are word embeddings from other timestamps taken into consideration for calculating the attention weights or only from the current timestamp (meaning word embeddings of the current sentence/input)?