KnowledgeGraphEmbedding icon indicating copy to clipboard operation
KnowledgeGraphEmbedding copied to clipboard

Three questions

Open zulihit opened this issue 3 years ago • 2 comments

Thank you for your work and I have three questions:

  1. Why do you use this method to calculate the initialization range? I didn't see the relevant introduction in your paper. What's the purpose of this method?

self.embedding_range = nn.Parameter( torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False )

self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entity_dim)) nn.init.uniform_( tensor=self.entity_embedding, a=-self.embedding_range.item(), b=self.embedding_range.item() )

  1. This range is also used when pluralizing relationships. Why can this be done?

phase_relation = relation/(self.embedding_range.item()/pi) re_relation = torch.cos(phase_relation) im_relation = torch.sin(phase_relation)

  1. In the rotate model, the calculations of head batch and tail batch are different in sign, but in the paper i can't find the head-batch part, i can't understand this part

if mode == 'head-batch': re_score = re_relation * re_tail + im_relation * im_tail im_score = re_relation * im_tail - im_relation * re_tail re_score = re_score - re_head im_score = im_score - im_head else: re_score = re_head * re_relation - im_head * im_relation im_score = re_head * im_relation + im_head * re_relation re_score = re_score - re_tail im_score = im_score - im_tail

zulihit avatar May 26 '22 07:05 zulihit

I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that $$h \circ r \approx h$$. This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.

Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.

In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.

albernar avatar Jan 19 '24 15:01 albernar

For Question 3, after researching and comparing the paper, I find a solution that may explain it. The original concept is based on complex numbers, specifically:

$e^{i\theta} = \cos \theta + i \sin \theta$

Thus, in the code, the entity and relation embeddings are split into real and imaginary parts. Therefore, an entity and relation can be written as:

$h = a + bi$

$r = c + di$

The rotation operation on the entity can then be written as:

$h \times r = (a + bi) \times (c + di) = ac + adi + bci + bdi^2 = ac - bd + (ad + bc)i$

This results in the code corresponding to Question 3. I hope this helps with understanding this part, and please feel free to correct me if there are any mistakes.

fanglin1 avatar Sep 03 '24 08:09 fanglin1