gst-tacotron
gst-tacotron copied to clipboard
Training with custom data
Curious if others have achieved reasonable results training on custom data. I've tried training the model on data from https://github.com/aomv/voiceloop-in-the-wild-experiments/tree/master/data/donald-trump/data (which has audio files and transcriptions of a few seconds in length, for somewhere around a couple hours in total) making a metadata.csv file in the same format as the LJSpeech dataset.
While I've trained for several hours with a steadily decreasing loss, the graph would indicate the model is not learning properly. I've also failed to generate intelligible audio at least without using a reference audio (trying several times).
@wanshun123 Hi, I cannot open the data link to check the quality of data. I tried different data sets before and found it works.
Besides, the attention used in this repo is a very basic one, which is not so good to generate long sentences.
@wanshun123 Did you train using use_gst=False
? I have the same issue when use_gst=False
but not when True
.
@syang1993 In my case the audio seems intelligible, although not good quality. I am using the Emotional Speech Dataset from https://hltsingapore.github.io/ESD/download.html
The English data shows similar attention "collapse". The Chinese data is ok.