verse
verse copied to clipboard
Not-so-good clustering in experiments
Hi!
I tried to use VERSE to visualize a not-so-large (nv: 23463, ne: 35923) well-clustered graph. I used PPR version with --dim 2 (Total steps (mil): 2346.3) and then used two dimensions as x and y (after normalization) and pre-calculated cluster IDs (Louvain method) as colour to visualize the embedded graph.
I ended up this:
While I was expecting a visualization in which all clusters separated perfectly, as in example shown in your article.
Any idea about which config should I use or what was wrong with my procedure?
Using 128 dimensions and then using UMAP to reduce the result to x and y, I ended up this:
Is this a right approach? Can I make it better?
Did you calculate the modularity of Louvain algorithm and of, say, k-means on the embedding? Are they comparable?
No. How can I do that? Feed the final bidimensional result of embedding to sklearn or sth? 🤔
P.S. I saw many times this approach of feeding higher dimensional embeddings of VERSE or node2vec into UMAP to get a bidimensional embedding for visualization, and it seems to work better than using e.g. VERSE to get a bidimensional embedding directly. But I don't get it. Aren't UMAP another embedding tool just liker VERSE and node2vec, only with a different approach?
I would feed 128d embeddings personally.
Regarding 2d vs. 128d embeddings, the objective functions of UMAP or TSNE are tailored towards visualization task. VERSE is a bit different, offering similarity preservation for analysis of graphs.
I'll test that later this way. 🤔
Different approaches to design objective functions is an important point. I did not dig too into UMAP. Thank you!