HGAT Confusion when compared with the Paper.

It seems that in paper, type-level attention is calculated followed by node-level. But in code it looks quite the opposite. Also, in node-level attention, there is no concatenation operation like in paper. And even after applying softmax, there are extra steps, I don't understand what it has to do with the equations in the paper.

attention = F.softmax(attention, dim=1)
attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())
attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))
h_prime = torch.matmul(attention, g)

What is the significance of these steps? And also can someone explain the Node-Level Attention part in the code?

Nov 26 '20 14:11 abhinab303

Hi, below is my opinion, may be not very true.

attention = torch.mul(attention, adj.sum(1).repeat(M, 1).t())

I think after the above softmax operation, the attention's weight has been scaled smaller, so it should be recovered to original weight level by using torch.mul with the total connection count of node.

attention = torch.add(attention * self.gamma, adj.to_dense() * (1 - self.gamma))

I think this statement is to deal with the balance with attention and original node weight, so it use gamma to combine together.

You could use ipdb to debug the code, to see what happen under the hook.

Nov 30 '20 04:11 jimmy-walker

@jimmy-walker Thank you. Yes, it kind of makes sense now. There is one more thing. The last line:

h_prime = torch.matmul(attention, g)

Is this related to equation (7) ?

Nov 30 '20 06:11 abhinab303

It may correspond the part of equation(7), which locates within sigma. It's a long time away from reading above code. 囧rz Sorry, I am busy with the work now. If I have time, I would recheck it carefully.

Nov 30 '20 06:11 jimmy-walker

Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.

Nov 30 '20 07:11 abhinab303

Okay, no worries. I might write an explanation of how equation (5) and (6) are related to the code. So, it'll be easier for others too. Thanks again.

Good Job， Looking forward to see the explanation.

Nov 30 '20 07:11 jimmy-walker

bro, in the type_level's code (SelfAttention), what's the explanation of this line ? outputs = torch.matmul(weights.transpose(1, 2), inputs).squeeze(1) * 3 Thanks!

Mar 01 '21 08:03 YiTangJ

@YiTangJ

Outputs = weights_transpose.inputs * 3 (I'm not sure why it is multiplied by 3)
redundant dimension is removed by squeeze
outputs shape = (N, in_features) = (N, 512)
torch.matmul uses batch multiplication. See here

Mar 01 '21 12:03 abhinab303

thanks！And you mean outputs = weights_transpose * inputs * 3，right？

Mar 01 '21 13:03 YiTangJ

Yes, outputs = weights_transpose * inputs * 3. Do you know why it is multiplied by 3?

Mar 01 '21 14:03 abhinab303

@abhinab303 in my opinion，because there are three types of nodes：short text/entity/topic . But i'm not sure.

Mar 02 '21 07:03 YiTangJ

HGAT HGAT copied to clipboard

Confusion when compared with the Paper.

HGAT
HGAT copied to clipboard