electra icon indicating copy to clipboard operation
electra copied to clipboard

The unbalance between original tokens and replaced tokens.

Open allanchen95 opened this issue 3 years ago • 0 comments

Hi, ELECTRA inspires me a lot, but there is a problem that puzzled me a lot. As we all know, only 15% tokens are replaced by generated tokens which can be viewed as the negatives. However, there are still about 85% original tokens, i.e, the positives.
Due to that the label unbalanced is a common issue in classification problem and ELECTRA is designed to predict all the tokens' distributions in a corrupt sentence, a question arose: can ELECTRA accurately find all the negatives (the replaced tokens), predicted to 0 by discriminator, when there have a dominant number of positives (the original tokens)?

allanchen95 avatar Dec 04 '20 06:12 allanchen95