assemble icon indicating copy to clipboard operation
assemble copied to clipboard

Community detection using spectral matrix analysis and clustering

Open henripal opened this issue 8 years ago • 9 comments

The idea here is to treat the graph matrix as a feature matrix and to use traditional dimension reduction/clustering techniques on these features.

An example workflow would be:

  • build the sparse follow matrix from the graph representation of the network
  • SVD or PCA the result http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
  • use classic clustering techniques on the result http://scikit-learn.org/stable/modules/clustering.html

good testing ground is the twitter #far-right data.

also check out great post and tutorial by @alejandrox1 https://github.com/Data4Democracy/discursive/issues/4 https://github.com/Data4Democracy/tutorials

henripal avatar Feb 01 '17 21:02 henripal

I can start working on this one. I'm thinking doing spectral clustering on graph Laplacian (instead of the adjacency matrix itself). How are we going to test the algorithm though? (do we have the labels?) I don't know where to find the #far-right data.

ashkan-leo avatar Feb 01 '17 23:02 ashkan-leo

@ashkan-leo added you to github org so I can assign this one to you. Please ping me on slack @bstarling to get the far-right data.

bstarling avatar Feb 02 '17 01:02 bstarling

@ashkan-leo we don't have labels. How to evaluate the results is a great question. We could be rank users (by number of followers or PageRank) then try to manually identify some communities using the top ranked users as a guideline & comparing to the algorithmically generated communities?

henripal avatar Feb 03 '17 20:02 henripal

Hi all.

My personal taste for large scale linear algebra problems is to first give it a go with Julia. The base svds is as powerful as I like it. shttp://docs.julialang.org/en/stable/stdlib/linalg/#Base.svds

We also have improved algos for large networks through https://github.com/nassarhuda/MatrixNetworks.jl and https://github.com/JuliaGraphs/LightGraphs.jl

gvdr avatar Feb 03 '17 21:02 gvdr

e.g., truncated SVD (10 singular values computed) on a sparse 45600x45600 matrix on my laptop: 16.078000 seconds (3.13 M allocations: 1.117 GB, 0.80% gc time)

gvdr avatar Feb 03 '17 22:02 gvdr

@gvdr I'm prototyping in julia as well and love it; but definitely not a problem if anyone else want to prototype in their favorite language either at this stage, I guess?

henripal avatar Feb 04 '17 04:02 henripal

Absolutely! I was thinking in terms of infrastructures: if we end setting up a virtual environment where to do analysis (wherever it is) let's make it open to Julia as well, not only python and R ;-)

gvdr avatar Feb 04 '17 18:02 gvdr

I reached out to the eventador folks about adding Julia kernel to the exiting notebook (they already added R). Next question will probably be in regards to packaging. Do you have a list of most common that you would want pre install you can post in channel or DM me? FYI domino who has donated compute infrastructure has Julia kernel as well.

bstarling avatar Feb 04 '17 19:02 bstarling

Curious, is this still being worked on? I am interested in helping.

JoeMcEwen avatar Nov 28 '17 03:11 JoeMcEwen