gst-tacotron icon indicating copy to clipboard operation
gst-tacotron copied to clipboard

Tone transfer

Open switchzts opened this issue 6 years ago • 4 comments

I want to know that this model is just to learn the rhythm of the statement you provide instead of the tone. Can I use this model to imitate the tone of his speech with a single sentence?

switchzts avatar Jul 13 '18 05:07 switchzts

The style is learned in an unsupervised way, which means that there is no constraint to make the model only focus on prosody. If you read the other Google's paper, you will find it may also learn some speaker information.

syang1993 avatar Jul 13 '18 06:07 syang1993

@syang1993 Thanks for reply, Does it mean that the training data requires sentences of the same person's different rhythms? What is the data in Blizzard Challenge 2013? I am still downloading. Is it a training set for different rhythms of one speaker?

switchzts avatar Jul 13 '18 06:07 switchzts

The Blizzard 2013 dataset is audio book data of a single speaker, which contains rich prosody. Besides, if you use neural data to train this model, the model will not learn the prosody information. It may work as traditional tacotron.

syang1993 avatar Jul 14 '18 05:07 syang1993

@syang1993 hi,thansk for your nice work。as you mentioned above:"Besides, if you use neural data to train this model, the model will not learn the prosody information. It may work as traditional tacotron". What do you mean: neural data? Now, learn from your published code, my model hardly learns the prosody information and how can I next

GengwangGitHub avatar Sep 18 '18 03:09 GengwangGitHub