fingerprint-securedrop icon indicating copy to clipboard operation
fingerprint-securedrop copied to clipboard

Investigate rebalancing the training set

Open redshiftzero opened this issue 7 years ago • 2 comments

We have a very imbalanced machine learning problem, where we have far fewer SecureDrop users than non-SecureDrop users. There are many ways of handling this situation - including oversampling the minority class or undersampling the majority class. Some of the techniques used for machine learning with very skewed classes are implemented in this library: https://github.com/scikit-learn-contrib/imbalanced-learn, so we could give some of these a try.

redshiftzero avatar Oct 12 '16 00:10 redshiftzero