verse Not-so-good clustering in experiments

Hi! I tried to use VERSE to visualize a not-so-large (nv: 23463, ne: 35923) well-clustered graph. I used PPR version with --dim 2 (Total steps (mil): 2346.3) and then used two dimensions as x and y (after normalization) and pre-calculated cluster IDs (Louvain method) as colour to visualize the embedded graph. I ended up this: Untitled While I was expecting a visualization in which all clusters separated perfectly, as in example shown in your article. Any idea about which config should I use or what was wrong with my procedure?

Feb 21 '21 09:02 hadisfr

Using 128 dimensions and then using UMAP to reduce the result to x and y, I ended up this: Figure_1 Is this a right approach? Can I make it better?

Feb 21 '21 18:02 hadisfr

Did you calculate the modularity of Louvain algorithm and of, say, k-means on the embedding? Are they comparable?

Feb 21 '21 18:02 xgfs

No. How can I do that? Feed the final bidimensional result of embedding to sklearn or sth? 🤔

P.S. I saw many times this approach of feeding higher dimensional embeddings of VERSE or node2vec into UMAP to get a bidimensional embedding for visualization, and it seems to work better than using e.g. VERSE to get a bidimensional embedding directly. But I don't get it. Aren't UMAP another embedding tool just liker VERSE and node2vec, only with a different approach?

Feb 21 '21 19:02 hadisfr

I would feed 128d embeddings personally.

Regarding 2d vs. 128d embeddings, the objective functions of UMAP or TSNE are tailored towards visualization task. VERSE is a bit different, offering similarity preservation for analysis of graphs.

Feb 22 '21 18:02 xgfs

I'll test that later this way. 🤔

Different approaches to design objective functions is an important point. I did not dig too into UMAP. Thank you!

Feb 22 '21 18:02 hadisfr

verse verse copied to clipboard

Not-so-good clustering in experiments

verse
verse copied to clipboard