pyod icon indicating copy to clipboard operation
pyod copied to clipboard

Error in hbos with auto for n_bins

Open markfried96 opened this issue 2 years ago • 1 comments

I'm trying to use the hbos algorithm with the auto parameter for n_bins. PyOD creates the model without any problems. But when I'm trying to call the decision_function on the trained model with a new data point, I get the following error:

"attempt to get argmax of an empty sequence"

This happens in the following function: _calculate_outlier_scores_auto -> get_optimal_n_bins(X[:, i]) -> np.argmax(maximum_likelihood) ->attempt to get argmax of an empty sequence

Does anyone have a quick solution? Am I doing something wrong?

markfried96 avatar Jan 30 '23 20:01 markfried96

Hey, I just encountered this problem too. I think the problem is as follows:

When the detector is used with the n_bins="auto" parameter, it will calculate the optimal number of bins of the training data when fit() is called (see line 114):

https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L107-L115

Now this is as expected, but it turns out that the same calculation is done again on the new test data, when predict() is called. The optimal_n_bins will be different from those of the training data, thus the checks below line 235 will not be valid checks anymore. This can cause array out of bound errors. Also gives you other weird errors when calling predict() with only one sample.

https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L235-L267

called via:

https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L172-L176

To fix this, _calculate_outlier_scores_auto() should check if the bin calculated from np.digitize() is in range of the bins of the training phase - i.e. bin_edges[i] and not calculate a new number of bins.

rkost avatar Aug 03 '23 15:08 rkost