pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

implementaion about knowledge graph attention

Open junkangwu opened this issue 3 years ago • 10 comments

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

Hi, I'd like to implement the function as follows: image image That is, I need to calculate the attention alpha between the head entity and relations, and need to calculate the attention belta between relation and tail entity. So, How can I use pyg to implement attention score in terms of a part of neighbors, not all neighbors? Note: I can implement it in the numerator, but it's not clear how do I compute the denominator image image The above is the explanation of the Dimension of the tensor in the softmax function where num_edge is the number of all edge_index. Looking forward to your help! Thanks a lot!

junkangwu avatar Apr 12 '21 16:04 junkangwu

You can filter your index via edge_type holding the relation type information, e.g.:

    for i in range(num_relations):
        mask = edge_type == i
        self.propagate(edge_index[:, mask], x=x, relation=i)

def message(self, x_j, index, relation):
     alpha = ...  # Compute attention scores
     alpha = softmax(alpha, index)
     ...

rusty1s avatar Apr 13 '21 06:04 rusty1s

Thanks a lot, I understand your idea now!

junkangwu avatar Apr 13 '21 07:04 junkangwu

@rusty1s By the way, I find it difficult to compute alpha and beta at the same time. If I filter index via edge_type, that is I focus on the subgraph about relation_i. Then beta attention could be implemented while alpha is still hard to address. For example, if I mask edge_type i, then in the self.propagate, the edge_index is all about r_i. So as for attention alpha, I couldn't know the other relations which are the neighbors about entity u.

The desired final result shows below: For example: attention_{e_1,e_2} = alpha_{e_1,r_1} * beta_{r_1,e_2} image Looking forward to your help again!

junkangwu avatar Apr 13 '21 14:04 junkangwu

I see. You can also do the filtering directly inside message:

    self.propagate(edge_index, x=x, edge_type=edge_type)

def message(self, x_i, x_j, index, edge_type):
    # Compute alpha based on `index`
    for i in range(self.num_relations):
        cur_index = index[edge_type == i]
        # Compute beta based on `cur_index` for relational type `i`

rusty1s avatar Apr 14 '21 08:04 rusty1s

  • cool, In #Compute beta based on cur_index for relational type i, by using torch.where to fill the values into the specific indices in order to form a complete matrix. Do you agree?

  • By the way please, it seems that in the procedure of # Compute alpha based on index. It is very easy to appear the same entity, the same edge, attention double calculation. For example, in the case above figure: alpha_{e_1, r_1}=2 * value_{e_1, r_1} / (2 * value_{e_1, r_1} + 1 * value_{e_1, r_2} + 3 * value_{e_1, r_3)) But desired result is : alpha_{e_1, r_1} = value_{e_1, r_1} / (value_{e_1, r_1} + value_{e_1, r_2} + value_{e_1, r_3))

I'm sorry to trouble you lots of times as a green hand, Thanks a lot in advance. @rusty1s

junkangwu avatar Apr 14 '21 09:04 junkangwu

  1. Yes, I agree.
  2. Alpha is indeed a bit tricky to compute. I would probably go with re-weighting attention-coefficients based on the number of equally typed relations in a neighborhood. Does that make sense to you?

rusty1s avatar Apr 14 '21 11:04 rusty1s

Thanks a lot, I will have a try. It seems that indeed a bit tricky to compute.

junkangwu avatar Apr 14 '21 11:04 junkangwu

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)
  
      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]
  
      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

junkangwu avatar Apr 14 '21 12:04 junkangwu

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)
  
      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]
  
      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.

yuto3o avatar Apr 16 '21 12:04 yuto3o

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)
  
      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]
  
      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.

Sorry to reply so late. However, I failed to reproduce it...

junkangwu avatar Jun 20 '22 08:06 junkangwu