AoANet p_att_feats.narrow(2,0, self.multi_head_scale * self.d

p_att_feats.narrow(2,0, self.multi_head_scale * self.d_model) why? 170 line in AoAModel.py

Open d049 opened this issue 4 years ago • 3 comments

Jul 06 '20 13:07 d049

This is to split K (Key) and V (Value) for the attention module of the decoder. K and V are both linear transformation of the feature vectors.

Jul 08 '20 10:07 husthuaan

This is to split K (Key) and V (Value) for the attention module of the decoder. K and V are both linear transformation of the feature vectors.

The source code is: att = self.attention(h_att, p_att_feats.narrow(2, 0, self.multi_head_scale * self.d_model), p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model), att_masks)

in the third parameter p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model), start is equal to length, both are 1024.

Also, I print both p_att_feats.narrow(2, 0, self.multi_head_scale * self.d_model) and p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model), their shape both are (batch size, 196, 1024), which is equal to att_feat's shape. (I set attention shape is 14*14)

So, I confused that how this code did the split operation? And when start = length in tensor.narrow(), why this could be fine to work?

Oct 16 '20 03:10 binerone

The length of p_att_feats is 2*self.d_model (not self.d_model) in AoAModel.

Nov 10 '20 02:11 husthuaan

AoANet AoANet copied to clipboard

p_att_feats.narrow(2,0, self.multi_head_scale * self.d_model) why? 170 line in AoAModel.py

AoANet
AoANet copied to clipboard