HGAT
HGAT copied to clipboard
Confusion when compared with the Paper.
It seems that in paper, type-level attention is calculated followed by node-level. But in code it looks quite the opposite. Also, in node-level attention, there is no concatenation operation like in paper. And even after applying softmax, there are extra steps, I don't understand what it has to do with the equations in the paper.
attention = F.softmax(attention, dim=1)
attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())
attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))
h_prime = torch.matmul(attention, g)
What is the significance of these steps? And also can someone explain the Node-Level Attention part in the code?
Hi, below is my opinion, may be not very true.
attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())
I think after the above softmax operation, the attention's weight has been scaled smaller, so it should be recovered to original weight level by using torch.mul with the total connection count of node.
attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))
I think this statement is to deal with the balance with attention and original node weight, so it use gamma to combine together.
You could use ipdb to debug the code, to see what happen under the hook.
@jimmy-walker Thank you. Yes, it kind of makes sense now. There is one more thing. The last line:
h_prime = torch.matmul(attention, g)
Is this related to equation (7) ?
It may correspond the part of equation(7), which locates within sigma. It's a long time away from reading above code. 囧rz Sorry, I am busy with the work now. If I have time, I would recheck it carefully.
Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.
Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.
Good Job, Looking forward to see the explanation.
bro, in the type_level's code (SelfAttention), what's the explanation of this line ? outputs = torch.matmul(weights.transpose(1, 2), inputs).squeeze(1) * 3 Thanks!
@YiTangJ
- Outputs = weights_transpose.inputs * 3 (I'm not sure why it is multiplied by 3)
- redundant dimension is removed by squeeze
- outputs shape = (N, in_features) = (N, 512)
- torch.matmul uses batch multiplication. See here
thanks!And you mean outputs = weights_transpose * inputs * 3,right?
Yes, outputs = weights_transpose * inputs * 3
.
Do you know why it is multiplied by 3?
@abhinab303 in my opinion,because there are three types of nodes:short text/entity/topic . But i'm not sure.