BERT-pytorch icon indicating copy to clipboard operation
BERT-pytorch copied to clipboard

Question about the loss of Masked LM

Open zhezhaoa opened this issue 6 years ago • 5 comments

Thank you very much for this great contribution. I found the loss of masked LM didn't decrease when it reaches the value around 7. However, in the official tensorflow implementation, the loss of MLM decreases to 1 easily. I think something went wrong in your implementation. In additional, I found the code can not predict the next sentence correctly. I think the reason is: self.criterion = nn.NLLLoss(ignore_index=0). It can not be used as criterion for sentence prediction because the label of sentence is 1 or 0. We should remove ignore_index=0 for sentence prediction. I am looking forward to your reply~

zhezhaoa avatar Dec 07 '18 12:12 zhezhaoa

I think the reason is: self.criterion = nn.NLLLoss(ignore_index=0). It can not be used as criterion for sentence prediction because the label of sentence is 1 or 0.

I think you are right. My loss of next sentence is very low, but the acc of next_correct is always near 50%.

tanaka-jp avatar Dec 14 '18 10:12 tanaka-jp

I've been trying to repro BERT's pretraining results from scratch in my own time, and I have been unable to train beyond an masked LM loss of 5.4. So if anyone is able to get past this point I'd love to learn what you did.

raulpuric avatar Jan 25 '19 00:01 raulpuric

Sorry for my late update, and I think your point is right too. I'll fix it up ASAP

codertimo avatar Apr 08 '19 13:04 codertimo

What is the verdict here regarding next sentence task? Should we use 2 different loss function, without ignore=0 for sentence prediction?

And what about the MLM? anyone found a solution? Can't drop also beneath 6/7...

itamargol avatar May 05 '19 12:05 itamargol

I have the same problem ........

tanqiao2 avatar Jun 14 '23 09:06 tanqiao2