vits
vits copied to clipboard
Training time on VCTK.
Thanks for your great work. I have been training a multi-speaker VITS model 160000 steps for 2 days on 8 V100 GPUs. The synthesized speech is clear but not that fluent. How many steps did you trained on VCTK dataset, and how long? Thanks in advance.
what is the status now? Do you know the training time on LJ? Thanks
I'v trained 450000 steps, and synthesized speech is much better.
160k for 2 days training even in 8 GPUS and I only have 1 gpu, the training speed seems too slow, any ideas to improve this?
160k for 2 days training even in 8 GPUS and I only have 1 gpu, the training speed seems too slow, any ideas to improve this?
I image the slow improvement is largely due to the jointly optimized hifigan decoder, since older acoustic models that predicts mel spectrograms seemed to be much easier to train. Maybe we can adapt to new data based on existing checkpoints? how would one handle the multi-speaker conditioning on the decoder then?
Hey, can you tell me about format dataset for multi-speaker, especially folder structure?