protein-bert-pytorch Computing Global Attention

Computing Global Attention

Open Basso42 opened this issue 1 year ago • 0 comments

Hello,

I am trying to use the code and adapt it for a certain task. I have a few questions about the code, I am sorry if the answers are simple but some parts are quite obscure to me. In the protein_bert_pytorch.py file,m you have two main classes GlobalLinearSelfAttention and CrossAttention that are linked with the attention mechanism.

In the first one (from l. 27 to l. 62), in the forward method extracts the key, query and value vectors from the same input then normalizes them with a softmax before doing a tensor contraction. I did not find anything similar in the TensorFlow model nor the paper, what is the idea behind these operations?
In the second one (from l. 64 to l. 121) Line 86 and 87, two random tensors are initialized to be then concatenated to the incoming key and value vectors line 97 and 98 ; why is that and what in the paper or in the Tensor Flow code would justify this operation ?

I would have other questions but this one is the most important to me, as I do not understand the link between these and the papers nor the TensorFlow implementation.

Thank you for your help! Screenshot from 2023-12-11 14-59-32

Dec 11 '23 18:12 Basso42

protein-bert-pytorch protein-bert-pytorch copied to clipboard

Computing Global Attention

protein-bert-pytorch
protein-bert-pytorch copied to clipboard