xgi
xgi copied to clipboard
Add community detection algorithms
Just to get the ball rolling, I have been looking at what the other libraries have about this.
HGX
- Hy-MMSBM (hypergraph mixed-membership stochastic block model). They do the fitting part using EM.
- Hy-SC (spectral clustering). I'd have to read the paper for the details.
- Hypergraph-MT (mixed memberships, assortative structures), also here EM.
- Hyperlink communities. Hierarchical clustering on hyperedges distances/similarities.
HypernetX
- Modularity stuff. They have two methods that apply Louvain to maximize the extension of modularity to HGs (I think the one defined here--I have just skimmed the papers).
Which ones do we want to implement? Do we have a priority list or something of the kind?
I would go first for the most "basic" and standard one. And by standard I mean if not standard in hypergraphs yet, that its potential analogue in graphs is standard. Bonus points if it's not the slowest.
Any idea which one would fit? For example I like the mixed-membership ones, but maybe mixed-membership is already kind of more advanced?
I think Louvain-style algorithm would be simplest. In fact, I was hoping to implement the Kaminski et al. paper at one point: https://arxiv.org/abs/1810.04816. Maybe this is a good place to start?
mmm yes... modularity and link communities (allows for mixed memberships on the nodes @maximelucas ) should be quite easy to implement and relatively fast... about the standard question, it depends on who you ask (aaaargh).
but yeah, I guess we can have them... if no one calls dibs I can work on it.
- Hy-SC (spectral clustering). I'd have to read the paper for the details.
I've been working with this spectra for a while now and could contribute to using it for clustering. The paper suggests a heuristic for clustering that boils down to running K-means on the first s-many components of the eigenvectors. From the pyproject.toml file, it doesn't look like scikit-learn is a dependency, so I could either implement a basic version of K-means as part of the addition here or it could be added. It seems like a bit much to add for just K-means, though...
Let me know what yall think and I can maybe make the PR this weekend!
Hey @kaiser-dan, this would be great! I have been procrastinating this for too long :/
idk whats the best way to proceed about the sklearn dependency... let's see what the others have to say
If k-means isn't too bad to implement, I say we include it as a small helper function in the same module and avoid the sklearn dependency. Thanks @kaiser-dan! This is a long-overdue issue.
Okie I have started drafting some architecture to address this in the draft PR #665. I will work on it more this weekend, but I wanted to get your thoughts on it before I get too far in.
The short of it is that I suspect the numerous algorithms for community detection and their evaluation would necessitate a new module. I added that in the draft PR with the commdetect module (I'm not crazy about the name); I didn't find anything about setting up whole new modules in the contributing guide, but I realize a change like this would include significant contextual changes in the docs, tutorials, etc.
Do you all think I should keep developing this alongside the particular spectral_clustering algorithm, hopefully making our lives easier when we get to the other algorithms, or should I stick to a more minimal change? I am happy to do either!
I think that this sounds right. A module with corresponding utility functions (either in utils or in the module itself) sounds like the best path forward. Maybe I would call the module communities, but I'm not sure what other libraries do. Thanks for spearheading this!