jasyncapi icon indicating copy to clipboard operation
jasyncapi copied to clipboard

Classification for imbalanced data

Open ZhouFang928 opened this issue 8 years ago • 0 comments

Recently, I have been exploring the methods for classification on imbalanced data. As I know, the most commonly used technique is a combination of resampling/subsampling techniques plus classification models, like boosted decision tree, random forest, or others. There are various resources online which discuss about this problem, for example, the paper "Handling Imbalanced Data in Customer Churn Prediction Using Combined Sampling and Weighted Random", the blog "8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset", and the GitHub resource https://github.com/topepo/ICHPS2015_Class_Imbalance/commit/master. One question coming into my mind is how imbalanced the data could be and how we can make our model perform better when the proportion of minority class goes down to 10%, 1%, or even 0.1% level. Hope this is also an interesting topic for you.

ZhouFang928 avatar Apr 17 '16 10:04 ZhouFang928