handwritten-chinese-ocr-samples
handwritten-chinese-ocr-samples copied to clipboard
The spending of time to train a language model based on transformer with fairseq
I am using 3 Tesla V100 GPUs to train a model based on transformer with fairseq. The parameters are set as the same as the given train sentence. However, each epoch takes a lot of time (more than 2 hours) and is this normal ? I'd like to know how long it takes to train the model when you did this reserch. Thank you~
I can stop training until the Perplexity(PPL) is 29.xx, right? Now after 20 epochs, the Perplexity is 30.16.