hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

HDBSCAN flat returns more than `n_clusters`

Open KKJSP opened this issue 3 years ago • 1 comments

Observed The HDBSCAN flat module that is documented here is supposed to return a fixed number of clusters controlled by the n_clusters parameter. I came across a sample where it returns more than the requested number of clusters.

Expected HDBSCAN flat must return exactly n_clusters for all inputs.

Code and data Here is a simple dataset for which HDBSCAN returns more than n_clusters -> data.csv

Here is the code

import pandas as pd
from hdbscan import flat
df = pd.read_csv("data.csv")
clustering = flat.HDBSCAN_flat(df, min_samples=2, min_cluster_size=2, n_clusters=3)
print(set(clustering.labels_))

This prints {0, 1, 2, 3, -1} i.e. four clusters 0, 1, 2, and 3.

KKJSP avatar Sep 16 '22 10:09 KKJSP

I would be willing to take this on. Can this be assigned to me please?

traderjoesbrownielover avatar Sep 16 '22 18:09 traderjoesbrownielover