assemble
assemble copied to clipboard
Community detection using spectral matrix analysis and clustering
The idea here is to treat the graph matrix as a feature matrix and to use traditional dimension reduction/clustering techniques on these features.
An example workflow would be:
- build the sparse follow matrix from the graph representation of the network
- SVD or PCA the result http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
- use classic clustering techniques on the result http://scikit-learn.org/stable/modules/clustering.html
good testing ground is the twitter #far-right data.
also check out great post and tutorial by @alejandrox1 https://github.com/Data4Democracy/discursive/issues/4 https://github.com/Data4Democracy/tutorials
I can start working on this one. I'm thinking doing spectral clustering on graph Laplacian (instead of the adjacency matrix itself). How are we going to test the algorithm though? (do we have the labels?) I don't know where to find the #far-right data.
@ashkan-leo added you to github org so I can assign this one to you. Please ping me on slack @bstarling to get the far-right data.
@ashkan-leo we don't have labels. How to evaluate the results is a great question. We could be rank users (by number of followers or PageRank) then try to manually identify some communities using the top ranked users as a guideline & comparing to the algorithmically generated communities?
Hi all.
My personal taste for large scale linear algebra problems is to first give it a go with Julia. The base svds is as powerful as I like it. shttp://docs.julialang.org/en/stable/stdlib/linalg/#Base.svds
We also have improved algos for large networks through https://github.com/nassarhuda/MatrixNetworks.jl and https://github.com/JuliaGraphs/LightGraphs.jl
e.g., truncated SVD (10 singular values computed) on a sparse 45600x45600 matrix on my laptop: 16.078000 seconds (3.13 M allocations: 1.117 GB, 0.80% gc time)
@gvdr I'm prototyping in julia as well and love it; but definitely not a problem if anyone else want to prototype in their favorite language either at this stage, I guess?
Absolutely! I was thinking in terms of infrastructures: if we end setting up a virtual environment where to do analysis (wherever it is) let's make it open to Julia as well, not only python and R ;-)
I reached out to the eventador folks about adding Julia kernel to the exiting notebook (they already added R). Next question will probably be in regards to packaging. Do you have a list of most common that you would want pre install you can post in channel or DM me? FYI domino who has donated compute infrastructure has Julia kernel as well.
Curious, is this still being worked on? I am interested in helping.