yellowbrick icon indicating copy to clipboard operation
yellowbrick copied to clipboard

DendrogramVisualizer for Agglomerative Clustering

Open rebeccabilbro opened this issue 8 years ago • 5 comments

A dendrogram illustrates how each cluster is composed by drawing a U-shaped link between a non-singleton cluster and its children. The top of the U-link indicates a cluster merge. The two legs of the U-link indicate which clusters were merged. The length of the two legs of the U-link represents the distance between the child clusters. It is also the cophenetic distance between original observations in the two children clusters.

See also:

  • https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html
  • https://github.com/scikit-learn/scikit-learn/pull/3464

rebeccabilbro avatar Apr 20 '17 21:04 rebeccabilbro

something to the tune of:

import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram

    def plot_dendrogram(self, **kwargs):
      """
      Compute the distances between each pair of children and
      a position for each child node. Then create a linkage
      matrix, and plot the dendrogram.
      """
        distance = np.arange(self.model.children_.shape[0])
        position = np.arange(2, self.model.children_.shape[0]+2)

        linkage_matrix = np.column_stack([
            self.model.children_, distance, position]
        ).astype(float)

        fig, ax = plt.subplots(figsize=(15, 7))

        ax = dendrogram(linkage_matrix, orientation='left', **kwargs)

        plt.tick_params(axis='x', bottom='off', top='off', labelbottom='off')
        plt.tight_layout()
        plt.show()

rebeccabilbro avatar Apr 21 '17 16:04 rebeccabilbro

I have code for dendrogram simplification and plotting the resulting pruned/condensed dendrogram as part of my clustering project (http://github.com/scikit-learn-contrib/hdbscan). The condense_tree tree routine in hdbscan/_hdbscan_tree.pyx handles tree simplification and there is code in hdbscan/plots.py that does plotting. Feel free to steal whatever looks useful.

lmcinnes avatar May 21 '17 19:05 lmcinnes

@lmcinnes - this is excellent, thank you!

rebeccabilbro avatar May 22 '17 20:05 rebeccabilbro

Another option is the treemap:

http://scipy-cookbook.readthedocs.io/items/Matplotlib_TreeMap.html

bbengfort avatar Mar 27 '18 18:03 bbengfort