pyod
pyod copied to clipboard
COPOD mixes train set and test set
Dear contributor(s),
I have a question with respect to the implementation of the COPOD algorithm.
It seems like (if I am not mistaken) that this implementation is mixing train set and test set, when decision_function(X)
is called. So, during fitting the train set is nicely stored:
https://github.com/yzhao062/pyod/blob/7aeefcf65ceb0196434b7adb4fd706bfb404e4e2/pyod/models/copod.py#L121
When a test set in used to obtain the outlier scores, X_train
gets concatenated:
https://github.com/yzhao062/pyod/blob/7aeefcf65ceb0196434b7adb4fd706bfb404e4e2/pyod/models/copod.py#L143
In the next steps, it looks like previously fitted parameters (when calling fit()
) are overwritten by newly obtained parameters based on the concatenated X
(train set+ test set):
https://github.com/yzhao062/pyod/blob/7aeefcf65ceb0196434b7adb4fd706bfb404e4e2/pyod/models/copod.py#L125-L155
This behavior seems to be wrong, since now test set and train set are not nicely separated, which in general should be the case. I would be happy to receive some clarification about his.
Kind regards
Similar to my comment in https://github.com/yzhao062/pyod/issues/395 I would suggest to change the docsstring: https://github.com/yzhao062/pyod/blob/7aeefcf65ceb0196434b7adb4fd706bfb404e4e2/pyod/models/copod.py#L126
where 'fitted detector' should be removed, since this could mislead the user by thinking learning parameters were previously learned from the train set.
Another question/concerns involves the following line: https://github.com/yzhao062/pyod/blob/7aeefcf65ceb0196434b7adb4fd706bfb404e4e2/pyod/models/copod.py#L143
What will be the behavior of this method, when you pass X_train
also via decision_function()
? In other words, you concatenate the train set twice, which results in duplicated rows/samples. I am note sure if this is a type of behavior you want to allow. I am happy to hear from you.
Kind regards.