gnn
gnn copied to clipboard
How to calculate similarity between the two graphs?
I have to learn similarity between graphs using deep learning. I have many samples (~500k) of graphs. Graphs have ~5000 nodes and ~4000 edges in the average.
How can I compute similarity score between two graphs? I am thinking:
- convert graphs into vectors using Graph2Vec embedding
- then compare them using various similarity calculating techniques like cosine similarity.
I would really appreciate if I can get some feedback whether this is the correct way to approach this problem or not.
hi smith-co@,
I wonder what types of samples you have. Do you have labeled pairs of things that should be "similar" and things that should not be similar (positives and negative pairs) ? Or do you simply have unsupervised graphs ?
I ask because while things like Graph2Vec work, it has an specific induction bias, and its embedding may or may not be useful for you, you will have to try. Usually any unsupervised (or self-supervised in some way) training method, you will have to try if it works well for your application.
It is usually better to have labeled data of what should be similar (and not) to your application, feed those to two tower models (aka. dual-encoders) and use that to learn a good embedding. You can also start with an unsupervised training technique and then finetune.
Notice these comments on similarity (metric) training are generic and orthogonal to GNNs. Meaning you use dual-encoder, similarity metrics on top of a GNN on top of your graph data. (I'm making some assumptions on your type of data here, but I hope you get the picture).
An interesting related read: "Deep Metric Learning: A (Long) Survey". And a bunch of interesting links in github.com/qdrant/awesome-metric-learning.
I hope this helps!
Hi @smith-co ! There was no further activity on this issue, and I can't detect an actionable defect report or feature request for the TF-GNN library, so I'm going to mark this as closed. Feel free to reopen if you disagree.