glow-tts icon indicating copy to clipboard operation
glow-tts copied to clipboard

Number of training steps

Open sam1373 opened this issue 4 years ago • 1 comments

I have been trying to replicate the results from the paper, but I'm confused about the number of training steps. The paper mentions 240k steps, but when running this code on 8 V100 gpus 240k steps takes a lot longer that the 3 days from the paper. The base config here specifies 10000 epochs, which also doesn't seem like the correct amount. Could you clarify the correct amount of training epochs/steps?

sam1373 avatar Jun 12 '20 17:06 sam1373

10000 epochs is meaningless. You can reduce number of epochs, or just cancel during training.

I trained my model with the base config on 2 V100 gpus, and it took ~ 3.3 days. I think using large batch size might cause the slow training problem. The monotonic alignment search always operates on CPU cores, which means if the number of CPU cores does not increase with the number of gpus proportionally, CPU could take much time to search alignments for the 4 times larger batch than base setting. image

And according to the 4x increased batch size, I think it would make sense to quit training at less than 24k steps.

jaywalnut310 avatar Jun 16 '20 02:06 jaywalnut310