pyod does any of the algorithms support unequal length input?

I have time series data with unequal length rows. There is no missing points and hence no need to handle them. I want to perform outlier detection.

Here is a small example of the structure of my data:

timestamp	feature_1	feature_2	feature_3	feature_4	feature_5
timestamp _1	34	112	24	46	87
timestamp_2	99	12	66	16
timestamp _3	54	19
timestamp _4	1100

Does any of the algorithms implemented in PyOD support such input data? If not can you help me by suggesting possible solutions?

Oct 04 '22 09:10 Okroshiashvili

I am not an expert on this, but personally I feel like you have to first answer this question: "From an outlier point of perspective, what does it mean to not have a value at a certain feature ?" For example, if you don't care that time_stamp_4 has no values for features 2 - 5, then you can also impute all values with the average. The average will be seen by most outlier detection methods as a "normal" value. The problem however with this approach, is that this could "dilute" potential abnormal values in features you do care about.

Another approach would be to use multiple different detectors. Each detector is trained on a subset of samples that share the same features (with values).

I hope this helped a little bit.

Oct 14 '22 13:10 mbongaerts

@mbongaerts Thanks a lot for your suggestions.

I think it won't be good idea to impute any values because these numbers are "strictly discrete" meaning that each number has its own definition and imputation by average or any method won't work.

Using multiple different detectors seems good. However, I'm not sure in case of having lots of data how this will perform in production.

Anyway, thanks for your input 👍

Oct 17 '22 06:10 Okroshiashvili

I think this is typical in time series or sequence data. Have you tried https://github.com/datamllab/tods for this? Likely you need some sliding window or padding.

Oct 17 '22 18:10 yzhao062

Thanks @yzhao062 for your input. will try that definitely

Oct 17 '22 20:10 Okroshiashvili

I am almost positive all these algorithms will fail with jagged feature vectors as input. I had the same issue initially, and had to make all my input data equal in length. I am curious though if someone has used any of them with unequal length vectors.

Nov 09 '22 21:11 RyanZurrin

pyod pyod copied to clipboard

does any of the algorithms support unequal length input?

pyod
pyod copied to clipboard