AoANet
AoANet copied to clipboard
p_att_feats.narrow(2,0, self.multi_head_scale * self.d_model) why? 170 line in AoAModel.py
This is to split K (Key) and V (Value) for the attention module of the decoder. K and V are both linear transformation of the feature vectors.
This is to split K (Key) and V (Value) for the attention module of the decoder. K and V are both linear transformation of the feature vectors.
The source code is:
att = self.attention(h_att, p_att_feats.narrow(2, 0, self.multi_head_scale * self.d_model), p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model), att_masks)
in the third parameter p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model)
, start
is equal to length
, both are 1024.
Also, I print both p_att_feats.narrow(2, 0, self.multi_head_scale * self.d_model)
and p_att_feats.narrow(2, self.multi_head_scale * self.d_model, self.multi_head_scale * self.d_model)
, their shape both are (batch size, 196, 1024), which is equal to att_feat's shape.
(I set attention shape is 14*14)
So, I confused that how this code did the split operation? And when start
= length
in tensor.narrow(), why this could be fine to work?
The length of p_att_feats is 2*self.d_model (not self.d_model) in AoAModel.