K-BERT attention socre problem

attention socre problem

Open zhaiyutong opened this issue 3 years ago • 1 comments

Hi, thank you for your awesome work and code firstly!

When I am transformering your Pytorch code to Tensorflow, I encountered one question.

In your code, you handle the attention mask with visual matrix in bert_encoder.py , and then in your multi_headed_attn.py, you have the following code in the line 59

scores = scores + mask

I am wandering if that corresponds to the attention socre function (5) in your paper? the mask is the addtional M?

Thank you in advance for your responese

Dec 29 '20 06:12 zhaiyutong

Yes, the mask is represented as the matrix M in our paper.

Dec 29 '20 13:12 autoliuweijie

K-BERT K-BERT copied to clipboard

attention socre problem

K-BERT
K-BERT copied to clipboard