multigen icon indicating copy to clipboard operation
multigen copied to clipboard

您好,请问一个不是该模型的问题,也是您曾经在goole-bert提问过的attention mask的问题。

Open Yesgo1220 opened this issue 3 years ago • 1 comments

在bert源码create_attention_mask_from_input_mask中,We don't assume that from_tensor is a mask (although it could be). We don't actually care if we attend from padding tokens (only to padding) tokens so we create a tensor of all ones.这里Query的padding也会得到没有意义的attention scores,后面是否有处理掉他们呢?困扰很久了,感谢

Yesgo1220 avatar Apr 15 '21 03:04 Yesgo1220

和bert不一样,gpt因为是解码器所以attention mask是下三角矩阵而不是全1的。对于序列最后的padding因为不会在对应输出端施加loss因此不会影响前面有意义的token。

haozheji avatar Apr 23 '21 03:04 haozheji