pyod
pyod copied to clipboard
Error in hbos with auto for n_bins
I'm trying to use the hbos algorithm with the auto parameter for n_bins. PyOD creates the model without any problems. But when I'm trying to call the decision_function on the trained model with a new data point, I get the following error:
"attempt to get argmax of an empty sequence"
This happens in the following function: _calculate_outlier_scores_auto -> get_optimal_n_bins(X[:, i]) -> np.argmax(maximum_likelihood) ->attempt to get argmax of an empty sequence
Does anyone have a quick solution? Am I doing something wrong?
Hey, I just encountered this problem too. I think the problem is as follows:
When the detector is used with the n_bins="auto"
parameter, it will calculate the optimal number of bins of the training data when fit()
is called (see line 114):
https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L107-L115
Now this is as expected, but it turns out that the same calculation is done again on the new test data, when predict()
is called. The optimal_n_bins
will be different from those of the training data, thus the checks below line 235 will not be valid checks anymore. This can cause array out of bound errors. Also gives you other weird errors when calling predict()
with only one sample.
https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L235-L267
called via:
https://github.com/yzhao062/pyod/blob/6c77e27a7a95fa928af37ff48c3dc607fa9408fa/pyod/models/hbos.py#L172-L176
To fix this, _calculate_outlier_scores_auto()
should check if the bin calculated from np.digitize()
is in range of the bins of the training phase - i.e. bin_edges[i]
and not calculate a new number of bins.