pyod
pyod copied to clipboard
the difference between of clf.predict(X_train) and clf.labels_
why:
np.array_equal(clf.predict(X_train), clf.labels_)
and
np.array_equal(clf.decision_function(X_train), clf.decision_scores_)
are both
False ?
could any body give me a clue?
I am wondering the specific algorithm you are referring to. It is understanble some of them may not show the exact same result. For instance, if contamination is not set to the exact percentage of the outliers in the train set.
I am wondering the specific algorithm you are referring to. It is understanble some of them may not show the exact same result. For instance, if contamination is not set to the exact percentage of the outliers in the train set.
thanks for your reply, this is my whole script:
from pyod.models.knn import KNN
import numpy as np
contamination=0.006
clf = KNN(contamination=contamination)
clf.fit(X_train)
x11 = clf.labels_
x22 = clf.predict(X_train)
print(np.array_equal(x11, x22))
is it has something to do with contamination?
You can execute the following code to get an example of decision_scores_ different from decision_function(X) (unexpected behavior):
import numpy as np
from pyod.models.knn import KNN
rng = np.random.default_rng(42)
X = np.vstack(
(
rng.multivariate_normal([-3, -3], [[1, 0], [0, 1]], 300),
rng.multivariate_normal([3, 2], [[1, 0], [0, 1]], 200)
)
)
model = KNN()
model.fit(X)
np.testing.assert_array_equal(model.decision_scores_, model.decision_function(X))