pyod icon indicating copy to clipboard operation
pyod copied to clipboard

the difference between of clf.predict(X_train) and clf.labels_

Open planetb113 opened this issue 4 years ago • 3 comments

why:
np.array_equal(clf.predict(X_train), clf.labels_)
and
np.array_equal(clf.decision_function(X_train), clf.decision_scores_)
are both False ?
could any body give me a clue?

planetb113 avatar Jun 08 '20 13:06 planetb113

I am wondering the specific algorithm you are referring to. It is understanble some of them may not show the exact same result. For instance, if contamination is not set to the exact percentage of the outliers in the train set.

yzhao062 avatar Jun 08 '20 16:06 yzhao062

I am wondering the specific algorithm you are referring to. It is understanble some of them may not show the exact same result. For instance, if contamination is not set to the exact percentage of the outliers in the train set.

thanks for your reply, this is my whole script:

from pyod.models.knn import KNN  
import numpy as np

contamination=0.006  
clf = KNN(contamination=contamination)  
clf.fit(X_train)  
x11 = clf.labels_  
x22 = clf.predict(X_train)  
print(np.array_equal(x11, x22))  

is it has something to do with contamination?

planetb113 avatar Jun 09 '20 12:06 planetb113

You can execute the following code to get an example of decision_scores_ different from decision_function(X) (unexpected behavior):

import numpy as np
from pyod.models.knn import KNN

rng = np.random.default_rng(42)
X = np.vstack(
   (
      rng.multivariate_normal([-3, -3], [[1, 0], [0, 1]], 300),
      rng.multivariate_normal([3, 2], [[1, 0], [0, 1]], 200)
   )
)

model = KNN()
model.fit(X)
np.testing.assert_array_equal(model.decision_scores_, model.decision_function(X))

fdewez avatar May 10 '23 06:05 fdewez