hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Integrate graph objects into hdbscan

Open JanRhoKa opened this issue 3 years ago • 1 comments

Adapt the HDBSCAN function for graphs.

Create a new metric called "graph", which run HDBSCAN on graphs using an adjacency matrix of the graph, in scipy.sparse.csr_matrix format.

Modified the _hdbscan_sparse_distance_matrix function for this, as no graph has to be calculated from the data. Additionally an example of the function is provided under examples\plot_hdbscan_graph.py which shows the communities calculated by HDBSCAN, the communities calculated using the precomputed metric and the time save by integrating the graph directly into the function.

JanRhoKa avatar Apr 25 '22 13:04 JanRhoKa

Thanks for the updates. This looks like it is starting to come together. The test failures are due to importing networkx in the test module; but it doesn't seem to be actually used there. Perhaps you can remove the import.

It might also be useful (to help ensure the feature sees usage) to include some documentation for this. Perhaps a short tutorial on clustering a graph with hdbscan? Perhaps based on the example you already have?

lmcinnes avatar May 09 '22 19:05 lmcinnes