FastSpeech2 icon indicating copy to clipboard operation
FastSpeech2 copied to clipboard

FastSpeech2 trained using LibriTTS dataset

Open LEEYOONHYUNG opened this issue 4 years ago • 3 comments

Hi, my name is Yoonhyung Lee, who is studying Text-to-Speech. Thank you for your nice implementation of FastSpeech2. It helped me a lot to study it, but a question occurred to me.

According to the README.md, it seems that you have trained FastSpeech2 using LibriTTS dataset, but I cannot see the audio samples. Did you use all of the 585hours dataset for the training? How well the FastSpeech work on multispeaker dataset?

LEEYOONHYUNG avatar May 17 '21 15:05 LEEYOONHYUNG

@LEEYOONHYUNG Oh I just forgot to post the audio samples. I'll update the demo page some other day. Honestly speaking the quality of the synthesized LibriTTS samples is not as good as the result of the single speaker dataset. I guess it is because that the environment noises in the LibriTTS dataset are much severe than the LJSpeech dataset. It might be a good idea to apply some data cleaning tricks before training the TTS model.

ming024 avatar May 26 '21 07:05 ming024

I think it is quite natural learning multi-speaker TTS is more difficult. Thank you for your reply :D

LEEYOONHYUNG avatar May 27 '21 01:05 LEEYOONHYUNG

@ming024 any chance you could post a pretrained model for the multi-speaker English dataset, LibriTTS?

Great work with this repo and thanks in advacne!

lkurlandski avatar Jun 16 '21 17:06 lkurlandski