mage
mage copied to clipboard
TGN - calc embeddings for new nodes without edges
ENHANCEMENT
Is it possible to add the functionality of calculating embeddings for new nodes (which do not have any connections), so that later they can be used to predict connections?
DESCRIPTION
Hello! Thank you for developing the memgraph - the project is cool!
Now I am experimenting with TGN. Everything is fine, but one user case was lost. The problem is that now you can't add embeddings for nodes that don't have any links at all.
Therefore, when we add a node to the graph (without any connections initially) and try to connect it with other nodes, then such a case is not possible now.
But, in the documentation it was mentioned that:
"node update/deletion events since they occur very rarely - although we have prepared a codebase to easily integrate them."
Then the question is - is it possible to add the functionality of calculating embeddings for new nodes (which do not have any connections), so that later they can be used to predict connections?
Thank you!
Hello!
First of all, I am glad you like our MAGE project :smile: And glad TGN works fine. I will try to give you insight into TGN
and explain what could go wrong if we implement this feature, as I am not sure if it will give you desired results.
Is it possible to add the functionality of calculating embeddings for new nodes (which do not have any connections), so that later they can be used to predict connections?
Let's set memory_dim
as a dimension of memory, and feature_dim
as a dimension of the feature vector. Now in our implementation final embedding dimension consists of embedding_dim
: memory_dim
+ feature_dim
. There are also projection matrices with output feature dimension set to emedding_dim
again.
The problem here is that the final embedding in the general case consists of input feature vector and memory feature concatenated with neighbors' embeddings and then projected to again dimension of emedding_dim
. This means that from 2*embedding_dim
we get to embedding_dim
. But if there are no neighbors, there is a different (yet to be implemented) process to get final embedding. And there a big part of the final embedding will be all zeros. Why? Because final embedding, in that case, consists of memory_dim
+feature_dim
. And as there were no interactions for that node and nothing happened to it, the memory vector will be all zeros. For example, if we have a 128-dimensional vector as the final embedding_dim
, we will have 64 filled with zeros and 64 dimensions filled with numbers.
So it could happen that TGN will predict correctly some nodes based only on the similarity of the feature vector, but it is less likely. That is all the point of graph neural networks to make most of the neighborhood and feature vector.
"node update/deletion events since they occur very rarely - although we have prepared a codebase to easily integrate them."
This is part is for the message calculation. But to calculate full embedding, you still need features of neighborhood nodes.
Then the question is - is it possible to add the functionality of calculating embeddings for new nodes (which do not have any connections), so that later they can be used to predict connections?
So to summarize, it is possible, but it is less likely it will predict correctly new connections.
Thanks for the detailed answer!
Now it became clear to me why such functionality (calculation and addition of embeddings of new nodes without edges) was not added.
I also agree that it is better to predict relationships in this case through other methods (or regular classification, or multi-label classification).
Thanks again for the answer, I think that everything has been decided (for me) for this issue.