dice_loss_for_NLP
dice_loss_for_NLP copied to clipboard
Self-adjustment in the Dice Loss
I read the ACL2020 paper and it suggests self-adjustment in the Dice Loss with Figure 1, which explains the derivative approaches zero right after p exceeds 0.5. This is the case when the alpha is 1.0. However, the script for OntoNotes5 data use alpha=0.01, which is very small adjustment and gives almost same performance with just squared form of Dice. When I use alpha=1.0 and learn the model with the script and CoNLL2003 data, the model does not learn well (the F1 was about 28.96). I wonder why the self-adjustment does not affect well. Could you explain which value of alpha is best in general?
Hi, thanks for asking! Take binary classification as an example. Setting alpha reduces the relative loss for well-classified examples (p_t > 0.5), putting more focus on hard, misclassified examples. We tune alpha on [0.1, 0.01, 0.001] and select the best according to the validation set.
Thanks for the range of alpha value. In my case of BIO scheme-based NER model (which is multi-label sequence classification), the alpha values of [0.05, 0.1] range were good for learning, but still does not give better performance than crf or cross-entropy losses.