HGAT icon indicating copy to clipboard operation
HGAT copied to clipboard

Confusion when compared with the Paper.

Open abhinab303 opened this issue 4 years ago • 10 comments

It seems that in paper, type-level attention is calculated followed by node-level. But in code it looks quite the opposite. Also, in node-level attention, there is no concatenation operation like in paper. And even after applying softmax, there are extra steps, I don't understand what it has to do with the equations in the paper.

attention = F.softmax(attention, dim=1)
attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())
attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))
h_prime = torch.matmul(attention, g)

What is the significance of these steps? And also can someone explain the Node-Level Attention part in the code?

abhinab303 avatar Nov 26 '20 14:11 abhinab303

Hi, below is my opinion, may be not very true.

attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())

I think after the above softmax operation, the attention's weight has been scaled smaller, so it should be recovered to original weight level by using torch.mul with the total connection count of node.

attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))

I think this statement is to deal with the balance with attention and original node weight, so it use gamma to combine together.

You could use ipdb to debug the code, to see what happen under the hook.

jimmy-walker avatar Nov 30 '20 04:11 jimmy-walker

@jimmy-walker Thank you. Yes, it kind of makes sense now. There is one more thing. The last line:

h_prime = torch.matmul(attention, g)

Is this related to equation (7) ? image

abhinab303 avatar Nov 30 '20 06:11 abhinab303

It may correspond the part of equation(7), which locates within sigma. It's a long time away from reading above code. 囧rz Sorry, I am busy with the work now. If I have time, I would recheck it carefully.

jimmy-walker avatar Nov 30 '20 06:11 jimmy-walker

Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.

abhinab303 avatar Nov 30 '20 07:11 abhinab303

Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.

Good Job, Looking forward to see the explanation.

jimmy-walker avatar Nov 30 '20 07:11 jimmy-walker

bro, in the type_level's code (SelfAttention), what's the explanation of this line ? outputs = torch.matmul(weights.transpose(1, 2), inputs).squeeze(1) * 3 Thanks!

YiTangJ avatar Mar 01 '21 08:03 YiTangJ

@YiTangJ

  1. Outputs = weights_transpose.inputs * 3 (I'm not sure why it is multiplied by 3)
  2. redundant dimension is removed by squeeze
  3. outputs shape = (N, in_features) = (N, 512)
  4. torch.matmul uses batch multiplication. See here

abhinab303 avatar Mar 01 '21 12:03 abhinab303

thanks!And you mean outputs = weights_transpose * inputs * 3,right?

YiTangJ avatar Mar 01 '21 13:03 YiTangJ

Yes, outputs = weights_transpose * inputs * 3. Do you know why it is multiplied by 3?

abhinab303 avatar Mar 01 '21 14:03 abhinab303

@abhinab303 in my opinion,because there are three types of nodes:short text/entity/topic . But i'm not sure.

YiTangJ avatar Mar 02 '21 07:03 YiTangJ