handwritten-chinese-ocr-samples icon indicating copy to clipboard operation
handwritten-chinese-ocr-samples copied to clipboard

The spending of time to train a language model based on transformer with fairseq

Open Randy-1009 opened this issue 3 years ago • 1 comments

I am using 3 Tesla V100 GPUs to train a model based on transformer with fairseq. The parameters are set as the same as the given train sentence. However, each epoch takes a lot of time (more than 2 hours) and is this normal ? I'd like to know how long it takes to train the model when you did this reserch. Thank you~

Randy-1009 avatar Dec 24 '21 05:12 Randy-1009

I can stop training until the Perplexity(PPL) is 29.xx, right? Now after 20 epochs, the Perplexity is 30.16.

Randy-1009 avatar Dec 24 '21 05:12 Randy-1009