deep-voice-conversion icon indicating copy to clipboard operation
deep-voice-conversion copied to clipboard

did anyone finish sequence to sequence attention training?

Open benlaitang opened this issue 6 years ago • 2 comments

I write this referencing by https://github.com/keithito/tacotron, but it does not work. the ground truth mel-spectrogram as input can work, but predicted mel failed. Can anyone give me advises?

benlaitang avatar Dec 17 '18 08:12 benlaitang

I also have this issue. The audios from validation process sound great, while in testing process, the predicted mel spec rather than the ground truth will be input into the next time-step's pre-net, which leads to quite abnormal generated audios. I found the alignment images were not in diagonal shape. It proves the attention mechanism hasn't been learned well. However i don't know how to adjust the model or the training strategy.

MorganCZY avatar Jan 31 '19 03:01 MorganCZY

Yes, any clues on the Seq2Seq+Attention in this network will be great! Please update if anyone gets any solution. Thanks!

wishvivek avatar Jan 31 '19 05:01 wishvivek