cleora icon indicating copy to clipboard operation
cleora copied to clipboard

[paper] definition of M vs Fig. 2.

Open olszewskip opened this issue 2 years ago • 1 comments

Hi! Apologies if this is not the best place to ask this but. I've been reading Your paper and I hope You could clarify this little thing which is confusing to me:

  1. Section III.B offers the definition of matrix M, M_ab = e_ab/deg(v_a).
  2. Figure 2. presents toy example that includes explicit M matrices.

Are those latter matrices in 2. meant to follow the definition from 1.? If so, I must be misinterpreting something about this definition. Based on it I was expecting the rows of Ms in Fig. 2. to be normalized. Can I ask for example what is the value of deg(v_a) for the third row (i.e. v_a = p2_hash, if I understand correctly) of M_3 in Fig. 2. It seems that the entries in this row were constructed as follows:

  • taking v_b=u1_hash, we have e_ab=2 and deg(v_a)=4, and the entry in M_3 reads 1/2;
  • and taking v_b=u2_hash, we have e_ab=1 and deg(v_a)=3, and the entry in M_3 reads 1/3; so it seems that deg(v_a) is not just a function of v_a, which - for me - is not captured in the presentation in Section III.

What am I missing? Thanks in advance for any hints! :)

olszewskip avatar Apr 12 '22 09:04 olszewskip

Dear @olszewskip ! First of all, sorry for the delay. If the issue is urgent and we forget to reply, please feel free to ping us :)

As to your question: Just to be sure, I'll mention that Algorithm 1 in Section III and Figure 2 refer to different cases. Algorithm 1 refers to the case where we chunk the graph (that is, we cut the graph into N parts and then merge the resulting embeddings), but we only have one node type in the whole graph. This is not the same case as displayed in Figure 2 - there we handle multiple node (relation) types - users, products, brands - within one graph. Indeed, multiple matrices M are created but they are not graph chunks. They exist because technically Cleora handles 2D matrices with pairs of relations, and with more relations multiple such matrices must be created.

Now, to the issue with normalization - in Figure 2 each M matrix represents 2 node types, in such a case we decide to calculate transition matrix for transitions from node type 1 to node type 2 only. In case M_3 matrix, let's consider graph traversal always starting from the user nodes (u1_hash, u2_hash) towards product nodes (p1_hash, p2_hash, p3_hash). You will notice that values are indeed normalized by the number of edges running from each user node (user node being the "a" node in in M_ab = e_ab/deg(v_a) formula). If we had only 1 node type, all would be just like in Algorithm 1 and each row would be normalized. Figure 2 was in fact meant to show some complex non-standard use case were we could better display technical issues such as parallelism in computation. However, probably we should make this more clear and thanks for noticing.

Hope this helps! Barbara

barbara3430 avatar Jun 28 '22 15:06 barbara3430