transformer Questions about attentoin

Questions about attentoin

Open trx14 opened this issue 5 years ago • 0 comments

When I check the attention image, I found that the model actually put some attention in padding position. I think this is because the (1)embedding using dropout, (2)After the first block, the K and Q will lose the info for "Padding"(because the zero will changed after ff layer). So the mask function didn't work.

Apr 12 '19 14:04 trx14

transformer transformer copied to clipboard

Questions about attentoin

transformer
transformer copied to clipboard