gst-tacotron Training with custom data

Training with custom data

Open wanshun123 opened this issue 6 years ago • 2 comments

Curious if others have achieved reasonable results training on custom data. I've tried training the model on data from https://github.com/aomv/voiceloop-in-the-wild-experiments/tree/master/data/donald-trump/data (which has audio files and transcriptions of a few seconds in length, for somewhere around a couple hours in total) making a metadata.csv file in the same format as the LJSpeech dataset.

While I've trained for several hours with a steadily decreasing loss, the graph would indicate the model is not learning properly. I've also failed to generate intelligible audio at least without using a reference audio (trying several times).

step-34000-align

eval-34000_ref-randomweight-align

Nov 19 '18 23:11 wanshun123

@wanshun123 Hi, I cannot open the data link to check the quality of data. I tried different data sets before and found it works.

Besides, the attention used in this repo is a very basic one, which is not so good to generate long sentences.

Nov 26 '18 08:11 syang1993

@wanshun123 Did you train using use_gst=False? I have the same issue when use_gst=False but not when True.

@syang1993 In my case the audio seems intelligible, although not good quality. I am using the Emotional Speech Dataset from https://hltsingapore.github.io/ESD/download.html

The English data shows similar attention "collapse". The Chinese data is ok.

step-190000-align

Jun 15 '21 16:06 iamanigeeit

gst-tacotron gst-tacotron copied to clipboard

Training with custom data

gst-tacotron
gst-tacotron copied to clipboard