Bert-Chinese-Text-Classification-Pytorch
Bert-Chinese-Text-Classification-Pytorch copied to clipboard
Question about masked language model
代码中masked language modeling labels中-1标记的是被masked的token,loss计算忽略被mask的token,但是BERT论文中写的是”the final hidden vectors corresponding to the mask tokens are fed into an output softmax over the vocabulary“ 只计算masked token处的loss