K-BERT
K-BERT copied to clipboard
attention socre problem
Hi, thank you for your awesome work and code firstly!
When I am transformering your Pytorch code to Tensorflow, I encountered one question.
In your code, you handle the attention mask with visual matrix in bert_encoder.py , and then in your multi_headed_attn.py, you have the following code in the line 59
scores = scores + mask
I am wandering if that corresponds to the attention socre function (5) in your paper? the mask is the addtional M?
Thank you in advance for your responese
Yes, the mask is represented as the matrix M in our paper.