PyNomaly icon indicating copy to clipboard operation
PyNomaly copied to clipboard

Passing cluster_labels broken

Open mdruiter opened this issue 1 year ago • 2 comments

I think I have found a bug that occurs when passing some cluster_labels.

When I completely reverse the order of all input (data and cluster_labels), and I reverse the result (local_outlier_probabilities), I would expect the same numbers. This does happen as long as all cluster_labels values are equal. Once I have two (really separate) clusters, the results change when flipped! An extra indication that things go wrong (IMHO): the second cluster's neighbor numbers are in the first cluster!

A small reproduction example:

import matplotlib.pyplot as plt
from PyNomaly import loop

np.random.seed(1)
n = 9
data = np.append(np.random.normal(2, 1, [n, 2]), np.random.normal(8, 1, [n, 2]), axis=0)
clus = np.append(np.ones(n),                     2 * np.ones(n)).tolist()  # 2 cluster numbers!
model = loop.LocalOutlierProbability(data, n_neighbors=5, cluster_labels=clus)
fit = model.fit()
res = fit.local_outlier_probabilities
print(res)
print(fit.neighbor_matrix)

data_flipped = np.flipud(data)
clus_flipped = np.flipud(clus).tolist()
model2 = loop.LocalOutlierProbability(data_flipped, n_neighbors=5, cluster_labels=clus_flipped)
fit2 = model2.fit()
res2 = np.flipud(fit2.local_outlier_probabilities)
print(res2)
print(np.flipud(fit2.neighbor_matrix))

s  = 1 + 100 * res.astype(float)
s2 = 1 + 100 * res2.astype(float)
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s,  marker='+')
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s2, marker='x')
plt.show()

mdruiter avatar Mar 02 '23 22:03 mdruiter

The problem is in the 'definition' of neighbor_matrix: _compute_distance_and_neighbor_matrix returns indexes within the cluster, but _prob_distances_ev treats the numbers as being global.

mdruiter avatar Mar 06 '23 10:03 mdruiter

Hey @mdruiter - thanks for noting the issue and where it is occurring.

Are you able to submit a fix in a pull request?

vc1492a avatar Mar 20 '23 14:03 vc1492a