hdbscan
hdbscan copied to clipboard
Strange clustering
Code:
import hdbscan
import numpy as np
x = np.array([
[0.36789608, 0.17779213, 0.83797550, 0.77753013],
[0.36628222, 0.17353597, 0.83745314, 0.78465497],
[0.37088317, 0.17572623, 0.84084779, 0.78386849],
[0.36569396, 0.17433393, 0.83739746, 0.78440967],
[0.36793751, 0.17673337, 0.84037548, 0.78139651],
[0.36722952, 0.17239252, 0.83743829, 0.78435159],
[0.88804066, 0.81364667, 0.99931133, 1. ], # outlier
[0.36865044, 0.18000209, 0.83752632, 0.78532994],
[0.36644703, 0.17631954, 0.83802074, 0.78327519],
])
clusterer = hdbscan.HDBSCAN(
min_cluster_size = 2, # can not change it
min_samples = 1, # can not change it
)
clusterer.fit(x)
print(clusterer.labels_)
output:
[-1 -1 -1 -1 -1 -1 -1 -1 -1]
But I suspect something like this:
[0 0 0 0 0 0 -1 0 0]
How do I correct this behavior?
Set allow_single_cluster
=True?
It works. Thank you.