xgi Add community detection algorithms

Oct 20 '24 12:10 nwlandry

Just to get the ball rolling, I have been looking at what the other libraries have about this.

HGX

Hy-MMSBM (hypergraph mixed-membership stochastic block model). They do the fitting part using EM.
Hy-SC (spectral clustering). I'd have to read the paper for the details.
Hypergraph-MT (mixed memberships, assortative structures), also here EM.
Hyperlink communities. Hierarchical clustering on hyperedges distances/similarities.

HypernetX

Modularity stuff. They have two methods that apply Louvain to maximize the extension of modularity to HGs (I think the one defined here--I have just skimmed the papers).

Which ones do we want to implement? Do we have a priority list or something of the kind?

Oct 21 '24 10:10 thomasrobiglio

I would go first for the most "basic" and standard one. And by standard I mean if not standard in hypergraphs yet, that its potential analogue in graphs is standard. Bonus points if it's not the slowest.

Any idea which one would fit? For example I like the mixed-membership ones, but maybe mixed-membership is already kind of more advanced?

Oct 23 '24 09:10 maximelucas

I think Louvain-style algorithm would be simplest. In fact, I was hoping to implement the Kaminski et al. paper at one point: https://arxiv.org/abs/1810.04816. Maybe this is a good place to start?

Oct 23 '24 11:10 nwlandry

mmm yes... modularity and link communities (allows for mixed memberships on the nodes @maximelucas ) should be quite easy to implement and relatively fast... about the standard question, it depends on who you ask (aaaargh).

Oct 23 '24 11:10 thomasrobiglio

but yeah, I guess we can have them... if no one calls dibs I can work on it.

Oct 23 '24 11:10 thomasrobiglio

Hy-SC (spectral clustering). I'd have to read the paper for the details.

I've been working with this spectra for a while now and could contribute to using it for clustering. The paper suggests a heuristic for clustering that boils down to running K-means on the first s-many components of the eigenvectors. From the pyproject.toml file, it doesn't look like scikit-learn is a dependency, so I could either implement a basic version of K-means as part of the addition here or it could be added. It seems like a bit much to add for just K-means, though...

Let me know what yall think and I can maybe make the PR this weekend!

Feb 13 '25 00:02 kaiser-dan

Hey @kaiser-dan, this would be great! I have been procrastinating this for too long :/

idk whats the best way to proceed about the sklearn dependency... let's see what the others have to say

Feb 13 '25 07:02 thomasrobiglio

If k-means isn't too bad to implement, I say we include it as a small helper function in the same module and avoid the sklearn dependency. Thanks @kaiser-dan! This is a long-overdue issue.

Feb 13 '25 20:02 nwlandry

Okie I have started drafting some architecture to address this in the draft PR #665. I will work on it more this weekend, but I wanted to get your thoughts on it before I get too far in.

The short of it is that I suspect the numerous algorithms for community detection and their evaluation would necessitate a new module. I added that in the draft PR with the commdetect module (I'm not crazy about the name); I didn't find anything about setting up whole new modules in the contributing guide, but I realize a change like this would include significant contextual changes in the docs, tutorials, etc.

Do you all think I should keep developing this alongside the particular spectral_clustering algorithm, hopefully making our lives easier when we get to the other algorithms, or should I stick to a more minimal change? I am happy to do either!

Feb 14 '25 04:02 kaiser-dan

I think that this sounds right. A module with corresponding utility functions (either in utils or in the module itself) sounds like the best path forward. Maybe I would call the module communities, but I'm not sure what other libraries do. Thanks for spearheading this!

Feb 14 '25 15:02 nwlandry

xgi xgi copied to clipboard

Add community detection algorithms

xgi
xgi copied to clipboard