magpie icon indicating copy to clipboard operation
magpie copied to clipboard

How to deal with the problem of label imbalance??

Open JiaWenqi opened this issue 5 years ago • 1 comments

My training set has 100,000 doc samples and 1,000 tags, but I found that tags satisfy the long tail distribution. Some tags only appear in less than 10 docs, while others are basically included in every doc. So how should I deal with these situations?

JiaWenqi avatar Mar 14 '19 03:03 JiaWenqi

Magpie will likely learn to almost never recommend the classes from the long tail and will frequently default to the most common class. If that's not a behaviour you desire, then you might want to repartition your dataset to have more balanced class distribution.

jstypka avatar Mar 14 '19 09:03 jstypka