hdbscan
hdbscan copied to clipboard
Prediction Data Generation Fails w/ a Warning
Hello,
I have the following distance matrix (dist_matrix.npy.zip) that I calculated using some function.
from scipy.spatial.distance import pdist
from scipy.spatial.distance import squareform
dist_condensed = pdist(X, metric = lambda u, v: calc_distance(u[0], v[0]))
dist_matrix = squareform(dist_condensed)
Then, I fitted a model using the attached distance matrix as follows:
import hdbscan
clusterer = hdbscan.HDBSCAN(min_cluster_size = 2, min_samples = 2, metric = 'precomputed', prediction_data = True)
clusterer.fit(dist_matrix)
After running the above command, I got the following warning (which looks like to be important if you want to predict some data in a later time):
hdbscan/hdbscan_.py:1256: UserWarning: Cannot generate prediction data for non-vectorspace inputs -- access to the source data rather than mere distances is required!
Also, I'd like to predict the cluster of some new data points at a later time by passing a distance matrix. Does the approximate_predict
method accept a distance matrix (because I used a custom distance matrix originally)? I believe that's not the case, at least based on the documentation.
I even tried to see whether the provided example (see here) works but I got the same warning (see below).
data:image/s3,"s3://crabby-images/c5862/c5862519030a42984711cb0bab9ffe39f72c5381" alt="Screen Shot 2022-10-27 at 3 58 14 PM"
I appreciate it if you help me understand why I get that warning originally and how I can use the above method to predict the cluster of new data points in the future.
It seems like that when "precomputed" or some callable is used as the metric
parameter, it would occur.
I do think not supporting "precomputed" is reasonable, but callable metric should be supported.