dice_loss_for_NLP icon indicating copy to clipboard operation
dice_loss_for_NLP copied to clipboard

Self-adjustment in the Dice Loss

Open pinesnow72 opened this issue 3 years ago • 2 comments

I read the ACL2020 paper and it suggests self-adjustment in the Dice Loss with Figure 1, which explains the derivative approaches zero right after p exceeds 0.5. This is the case when the alpha is 1.0. However, the script for OntoNotes5 data use alpha=0.01, which is very small adjustment and gives almost same performance with just squared form of Dice. When I use alpha=1.0 and learn the model with the script and CoNLL2003 data, the model does not learn well (the F1 was about 28.96). I wonder why the self-adjustment does not affect well. Could you explain which value of alpha is best in general?

pinesnow72 avatar Apr 20 '21 02:04 pinesnow72

Hi, thanks for asking! Take binary classification as an example. Setting alpha reduces the relative loss for well-classified examples (p_t > 0.5), putting more focus on hard, misclassified examples. We tune alpha on [0.1, 0.01, 0.001] and select the best according to the validation set.

xiaoya-li avatar Apr 24 '21 02:04 xiaoya-li

Thanks for the range of alpha value. In my case of BIO scheme-based NER model (which is multi-label sequence classification), the alpha values of [0.05, 0.1] range were good for learning, but still does not give better performance than crf or cross-entropy losses.

pinesnow72 avatar Apr 26 '21 00:04 pinesnow72