hdbscan icon indicating copy to clipboard operation
hdbscan copied to clipboard

Get outlier_scores_ of new data points after using approximate_predict

Open qri-hub opened this issue 5 years ago • 3 comments

Hello, Thanks for the amazing lib. I've been toying around for a while with hdbscan and wanted to know if it was possible to get the outlier_scores from new data points.

Lets imagine I have a train and test dataset. I want to cluster the train dataset in order to build the first clusters and then perform appriximate prediction based on the test dataset. Finally I want to be able to get the outlier scores of the whole model (along with the new data points from test)

I have found that the outlier_scores function is called internally if the model was previously fitted and uses the condensed_tree from the data that was used to fit the model. Is it possible to update the condensed_tree once new datapoints are fed to the model ?

` model = hdbscan.HDBSCAN(prediction_data=True).fit(train_X) labels, strenghts = hdbscan.approximate_predict(model, test_X)

outliers = hdbscan._hdbscan_tree.outlier_scores(model.condensed_tree) scores = model.outlier_scores # sizeof(outlier_scores) == sizeof(train_X) `

qri-hub avatar Aug 06 '20 12:08 qri-hub

Ok my bad there seems to be a function that is not in the documentation : approximate_predict_scores

qri-hub avatar Aug 06 '20 12:08 qri-hub

However the function is not accessible from some reason. I found it in the test but when I manually build the code with setup.py the function is still not available. Are you planning it for the next release or is it a bug ?

qri-hub avatar Aug 06 '20 13:08 qri-hub

Hi @qri-hub I am able to use the function from the library directly. hdbscan.approximate_predict_scores(clusterer, test)

aloksmenthe avatar Mar 23 '22 05:03 aloksmenthe