Train_Hifigan_XTTS icon indicating copy to clipboard operation
Train_Hifigan_XTTS copied to clipboard

Output model generates noisy sounds, robotic voices

Open thucth-qt opened this issue 1 year ago • 2 comments

Hi @tuanh123789 after training for 3 days. My output model generates noisy sounds, robotic voices. This is my dataset specs:

  • over 400 hours of speech (Vietnamese total)
  • clear, high quality dataset.
  • diverse accent, gender,.. collected from youtube
  • I finetune HifiGAN from ckpt: https://huggingface.co/capleaf/viXTTS
  • I have trained for 155127 epochs and here is the output (drive file attached) I appreciate so much that if you can support me this issue.

thucth-qt avatar Nov 29 '24 03:11 thucth-qt

Hi, what batch size, how many GPU do you use to training, with 400 hours i think you should leave the training countinue. And note that this version work better with single speakers, just chose one target to finetune for best result. The multi-speaker train version contains a loss function that I haven't added yet

tuanh123789 avatar Nov 29 '24 03:11 tuanh123789

Hi, what batch size, how many GPU do you use to training, with 400 hours i think you should leave the training countinue. And note that this version work better with single speakers, just chose one target to finetune for best result. The multi-speaker train version contains a loss function that I haven't added yet

@tuanh123789 hi,is multi-speaker version is ready for now?

xiaoyangnihao avatar Mar 19 '25 07:03 xiaoyangnihao