multi-class-text-classification-cnn-rnn icon indicating copy to clipboard operation
multi-class-text-classification-cnn-rnn copied to clipboard

How to deal with the imbalance data problem?

Open heinze007 opened this issue 8 years ago • 1 comments

I tried to transplant the code on my own text classification data( 47 classes in 42000 records), finding out that the classifier would tend to choose the larger classes like THEFT, ASSULT and so forth. How you guys deal with the imbalance data to make them seems more 'balance'?

heinze007 avatar Sep 11 '17 02:09 heinze007

I've tried to replace the loss function, from Cross Entropy to Weighted Cross Entropy, to give the smaller groups more weights. It works out fairly but the accuracy got only around 70%...

heinze007 avatar Sep 11 '17 02:09 heinze007