gst-tacotron icon indicating copy to clipboard operation
gst-tacotron copied to clipboard

How to integrate this code to r9y9's wavenet_vocoder ?

Open rishikksh20 opened this issue 5 years ago • 13 comments

Is there any way to integrate this code with wavenet vocoder ?

rishikksh20 avatar Jul 23 '18 11:07 rishikksh20

Hi, the main part of this model is similar to Tacotron1. We can also add the style embedding part to Tacotron2, then integrate it to wavenet to get better results.

syang1993 avatar Jul 25 '18 08:07 syang1993

ok, I will modify r9y9/Tacotron-2 code and add your style embedding code in that and then will see hows it's working.

rishikksh20 avatar Jul 25 '18 09:07 rishikksh20

@syang1993 what about loss function ?

rishikksh20 avatar Jul 25 '18 10:07 rishikksh20

@rishikksh20 The style token is trained under an unsupervised way, I guess we don't need extra loss unless you have a specific purpose.

syang1993 avatar Jul 25 '18 10:07 syang1993

@syang1993 thanks! Is it possible to integrate Tacotron 1 with wavenet vocoder, as the GST Tacotron paper has mentioned that they have tested it on Wavenet, so I think it is possible.

rishikksh20 avatar Jul 25 '18 11:07 rishikksh20

@rishikksh20 I tried to integrate Tacotron1 with wavenet, but the performance is worse than Tacotron2. Though the paper tested it on wavenet, I guess it's easier to do it with tacotron2.

syang1993 avatar Jul 25 '18 13:07 syang1993

@syang1993 ok got it , issue with Tacotron 1 might be due to receptive field width. Anyways, regarding Tacotron 2 just adding style embedding part of your code enough (though I can check easily, but training a tacotron 2 took at least a week ), because in GST paper they mentioned some changes in decoder part also.

Sorry to ask you so much questions but it kind an urgent task for me and I have limited computation.

rishikksh20 avatar Jul 25 '18 16:07 rishikksh20

@rishikksh20 I'm not sure what the "receptive field width" mean in Tacotron1? I did Tacotron 2 before, I guess it doesn't take so much time to train. But actually if you add style embedding and reference encoder to Tacotron 2, it will take more time. And the decoder part in this repo not perfectly match to the paper, I just try to use the style embedding idea to see how it works. I guess you may don't need to reconstruct the paper's structure all the same, you can modify it with your own purpose.

syang1993 avatar Jul 26 '18 17:07 syang1993

has anyone tried GST w/ Tacotron 2 and WaveNet? I am working on it now but don't have results yet so this could be all for naught..

karamarieliu avatar Jul 31 '18 07:07 karamarieliu

@karamarieliu could you share your work with me, I am also working on this issue.

rishikksh20 avatar Jul 31 '18 09:07 rishikksh20

@rishikksh20 I am currently encounter an evaluation error so I'll post it when that is solved. Rn I have T1 with GST and Wavenet if you wanted that. Still testing it but it runs okay. https://github.com/karamarieliu/gst-tacotron-wavenet

karamarieliu avatar Aug 01 '18 00:08 karamarieliu

@karamarieliu means GST-Tacotron 1 with wavenet_vocoder running fine ? Do you have any voice sample of that? Because I tried to integrate gst-tacotron (based on Tacotron 1) with wavenet vocoder but it hasn't performed well. If you have any voice sample which generates spectrogram using gst-tacotron and synthesizes voice using wavenet_vocoder then please share with me. And also in your mentioned repo, you didn't mentioned how to use wavenet-vocoder with gst-tacotron.

rishikksh20 avatar Aug 01 '18 03:08 rishikksh20

@karamarieliu can you share how to train Wavenet (with gst Tacotron 1) here and how to synthesise audio though I followed the and figure out how to systhesize but it better if you elaborate bit if you have some time, otherwise please share the command to train wavenet if possible.

rishikksh20 avatar Aug 01 '18 20:08 rishikksh20