hdbscan
hdbscan copied to clipboard
Get outlier_scores_ of new data points after using approximate_predict
Hello, Thanks for the amazing lib. I've been toying around for a while with hdbscan and wanted to know if it was possible to get the outlier_scores from new data points.
Lets imagine I have a train and test dataset. I want to cluster the train dataset in order to build the first clusters and then perform appriximate prediction based on the test dataset. Finally I want to be able to get the outlier scores of the whole model (along with the new data points from test)
I have found that the outlier_scores function is called internally if the model was previously fitted and uses the condensed_tree from the data that was used to fit the model. Is it possible to update the condensed_tree once new datapoints are fed to the model ?
` model = hdbscan.HDBSCAN(prediction_data=True).fit(train_X) labels, strenghts = hdbscan.approximate_predict(model, test_X)
outliers = hdbscan._hdbscan_tree.outlier_scores(model.condensed_tree) scores = model.outlier_scores # sizeof(outlier_scores) == sizeof(train_X) `
Ok my bad there seems to be a function that is not in the documentation : approximate_predict_scores
However the function is not accessible from some reason. I found it in the test but when I manually build the code with setup.py the function is still not available. Are you planning it for the next release or is it a bug ?
Hi @qri-hub I am able to use the function from the library directly. hdbscan.approximate_predict_scores(clusterer, test)