Train_Hifigan_XTTS Output model generates noisy sounds, robotic voices

Hi @tuanh123789 after training for 3 days. My output model generates noisy sounds, robotic voices. This is my dataset specs:

over 400 hours of speech (Vietnamese total)
clear, high quality dataset.
diverse accent, gender,.. collected from youtube
I finetune HifiGAN from ckpt: https://huggingface.co/capleaf/viXTTS
I have trained for 155127 epochs and here is the output (drive file attached) I appreciate so much that if you can support me this issue.

Nov 29 '24 03:11 thucth-qt

Hi, what batch size, how many GPU do you use to training, with 400 hours i think you should leave the training countinue. And note that this version work better with single speakers, just chose one target to finetune for best result. The multi-speaker train version contains a loss function that I haven't added yet

Nov 29 '24 03:11 tuanh123789

Hi, what batch size, how many GPU do you use to training, with 400 hours i think you should leave the training countinue. And note that this version work better with single speakers, just chose one target to finetune for best result. The multi-speaker train version contains a loss function that I haven't added yet

@tuanh123789 hi，is multi-speaker version is ready for now?

Mar 19 '25 07:03 xiaoyangnihao