vits Training time on VCTK.

Training time on VCTK.

Open mudong0419 opened this issue 3 years ago • 5 comments

Thanks for your great work. I have been training a multi-speaker VITS model 160000 steps for 2 days on 8 V100 GPUs. The synthesized speech is clear but not that fluent. How many steps did you trained on VCTK dataset, and how long? Thanks in advance.

Sep 04 '21 13:09 mudong0419

what is the status now? Do you know the training time on LJ? Thanks

Sep 16 '21 01:09 MaxMax2016

I'v trained 450000 steps, and synthesized speech is much better.

Sep 17 '21 03:09 mudong0419

160k for 2 days training even in 8 GPUS and I only have 1 gpu, the training speed seems too slow, any ideas to improve this?

Jan 28 '22 01:01 HaiFengZeng

160k for 2 days training even in 8 GPUS and I only have 1 gpu, the training speed seems too slow, any ideas to improve this?

I image the slow improvement is largely due to the jointly optimized hifigan decoder, since older acoustic models that predicts mel spectrograms seemed to be much easier to train. Maybe we can adapt to new data based on existing checkpoints? how would one handle the multi-speaker conditioning on the decoder then?

Apr 18 '22 13:04 sos1sos2Sixteen

Hey, can you tell me about format dataset for multi-speaker, especially folder structure?

Apr 22 '22 06:04 kin0303

vits vits copied to clipboard

Training time on VCTK.

vits
vits copied to clipboard