Representation-Learning-for-Information-Extraction About the dimension projection

About the dimension projection

Open shaonanqinghuaizongshishi opened this issue 2 years ago • 1 comments

The linear projection after the self attention: bs = self_attention.size(0) self_attention = self_attention.view(bs, -1) linear_proj = F.relu(self.linear_projection(self_attention))

From the paper, they said "We project the self-attended neighbor encodings to a LARGER 4x2d dimensional space", so if you flatten out the last two dimensions of "self_attention" before the projection, how can you make sure neighbor < 4?

In my opinion, we should not flatten the last two dimensions before projection, we do projection on the last dimension whose size is 2d, and 2d < 4x2d, so we are projecting it to a larger space.

Please point it out if I understand this wrong at some place, or you do this on purpose for some reason.

Mar 24 '22 06:03 shaonanqinghuaizongshishi

@shaonanqinghuaizongshishi I think you are mistaking 4 to be the number of neighbors here. 4 is an arbitrary number chosen by the authors without much explanation on it. So the linear projections are designed such that, no matter what sizes you use for neighbors and embedding size, the shapes will be taken care of.

Mar 27 '22 12:03 Praneet9

Representation-Learning-for-Information-Extraction Representation-Learning-for-Information-Extraction copied to clipboard

About the dimension projection

Representation-Learning-for-Information-Extraction
Representation-Learning-for-Information-Extraction copied to clipboard