transformer icon indicating copy to clipboard operation
transformer copied to clipboard

Why doing Key and Query Masking ?

Open odie2630463 opened this issue 7 years ago • 9 comments

Nice Job! But in paper doesn't describe Key and Query Masking , can you give me some hint about that ? thanks!

odie2630463 avatar Jun 21 '17 02:06 odie2630463

The encoder leaves artifacts, which are non-zeros, for paddings. It doesn't make sense, a query attends to those, so before applying softmax for getting score, I overwrote them with very very small numbers. As a results they will have score 0. Likewise, queries for paddings should not have any values, so they are masked with zeros.

Kyubyong avatar Jun 22 '17 04:06 Kyubyong

In the figure 2 of paper, there is an optional masking layer. It is important to mask if the source contains very different sequence lengths. This project very seriously deals with masking, in Key, Query and Losses (almost everywhere possible). Great works, thanks!

zhedongzheng avatar Nov 01 '17 13:11 zhedongzheng

The usage of the masking is judged by whether the summation of the keys or queries at the last dimension is zero. However, the padding part, which is originally embedded with zeros, is added by positional embedding, so they will never be zeros. Thanks!!!!

darongliu avatar Nov 06 '17 03:11 darongliu

@darongliu You have raised a very important point. We have to create a mask before the positional encoding, otherwise the the key and query padding will never work. Thanks!

zhedongzheng avatar Nov 08 '17 14:11 zhedongzheng

@zhedongzheng Does this mean that this piece has a problem?

sunnnnnnnny avatar Nov 13 '18 08:11 sunnnnnnnny

No, I think it will not cause great performance drop. It is still a good implementation.

darongliu avatar Nov 13 '18 09:11 darongliu

@darongliu Doesn't that mean that the query mask and key mask are useless?

sunnnnnnnny avatar Nov 13 '18 13:11 sunnnnnnnny

Query masking is unnecessary? cause the padded query will be masked out by next block's key masking?

Life-0-1 avatar Jan 07 '19 08:01 Life-0-1

@darongliu Doesn't that mean that the query mask and key mask are useless?

It's not working. Check https://github.com/Kyubyong/transformer/issues/33

achillesliu avatar Jan 15 '19 09:01 achillesliu