fingerprint-securedrop
fingerprint-securedrop copied to clipboard
Investigate rebalancing the training set
We have a very imbalanced machine learning problem, where we have far fewer SecureDrop users than non-SecureDrop users. There are many ways of handling this situation - including oversampling the minority class or undersampling the majority class. Some of the techniques used for machine learning with very skewed classes are implemented in this library: https://github.com/scikit-learn-contrib/imbalanced-learn, so we could give some of these a try.