pyod icon indicating copy to clipboard operation
pyod copied to clipboard

specifying categorical features in Python Outlier Detection (PyOD)

Open shivasheeshyadav opened this issue 6 years ago • 5 comments

How to specify the categorical features in PyOD when using Histogram-based Outlier Detection (HBOS) for anomaly detection ? I've read that HBOS can be used for anomaly detection when there are categorical features involved. I found it's Python implementation here: https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.hbos But I can't figure out how should I pass the position or list of names of categorical features of my dataset while training the model. The code I've tried:

clf = HBOS(n_bins=10, alpha=0.1, tol=0.5, contamination=0.1)
clf.fit(train_df)
train_pred = clf.labels_

There is no parameter to mention categorical features while training.

shivasheeshyadav avatar Sep 14 '18 13:09 shivasheeshyadav

Hi there, Sorry for responding late. Unfortunately, this function has not been implemented. One temporary workaround is to turn your categorical into numerical (not a good idea though). Will update you once have this func in place.

yzhao062 avatar Sep 17 '18 18:09 yzhao062

@yzhao062 Thanks. What do you suggest then, Label encoding or OneHot encoding of the categorical features ?

shivasheeshyadav avatar Sep 19 '18 09:09 shivasheeshyadav

@yzhao062 @shivasheeshyadav , my dataset contains categorical features. Can pyod process categorical features in pyod now? What do you suggest to do with the categorical features?

2994186010 avatar Apr 03 '19 02:04 2994186010

Any update on this essential feature for categorical anomaly detection?

Stevod avatar Sep 29 '20 10:09 Stevod

@Stevod do you have any ideas on how you will handle this. I've been using this library for the last 3 months, and am now realizing the accuracy could be higher, since all the features I'm working with are categorical.

alik604 avatar Oct 03 '20 20:10 alik604