FastSpeech
FastSpeech copied to clipboard
Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
FastSpeech
Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
Training
- Set
data_pathinhparams.pyas the LJSpeech folder - Set
teacher_dirinhparams.pyas the data directory where the alignments and melspectrogram targets are saved - Put checkpoint of the pre-trained transformer-tts (weights of the embedding/encoder layers are used)
python train.py
Training curves (orange: character / blue: phoneme)
The size of the train dataset is different because transformer-tts trained with phoneme shows more diagonal attention
train:val:test=8:1:1, total => character:1126 / phoneme:3412
Training plots (orange: batch_size:64 / blue: batch_size:32)
Audio Samples
You can hear the audio samples here