BERT4doc-Classification
BERT4doc-Classification copied to clipboard
hight perplexity when Further Pre-Training
When do further pre-training on my own datas the ppl is too much high for example 709. I have 3582619 examples, and use batch size=8, epoch=3, learing rate=5e-5. Is there any advice ? Thanks a lot!
the further pre-trained task is masked language model, not language model, therefore using ppl i think may not be a good metric. can you set your batch size larger or using gradient accumulate? and you can check a accruacy of masked language model as well as the loss curve to check the further pre-training.