gst-tacotron Mumbling in synthesis

Mumbling in synthesis

Open a-froghyar opened this issue 4 years ago • 1 comments

Hey, thanks for the implementation @syang1993!

I'm using this code to implement another paper and I've bumped into some issues during synthesis. I'm getting good alignment on training and the interim synthesised results sound good, however during evaluation, the synthesis is very unpredictable and sometimes fails to synthesise understandable speech. It rather sounds like mumbling. It's not only on long utterances, but sometimes on short and mid-length texts too. I'm attaching a few alignment plots and audio examples.

I was wondering if you've come across this before and if you have any tips where I should look to fix this issue? I've trained the model using the multihead attention, do you reckon the GMM attention will improve a lot? eval-320000_ref-frankenstein_chp_13-4-align mumbling_samples.zip

Jan 11 '21 16:01 a-froghyar

This paper attempts to address this https://arxiv.org/abs/1910.10288 https://google.github.io/tacotron/publications/location_relative_attention/index.html

There appears to be a PyTorch implementation https://github.com/bshall/Tacotron

Feb 15 '21 08:02 EFHIII

gst-tacotron gst-tacotron copied to clipboard

Mumbling in synthesis

gst-tacotron
gst-tacotron copied to clipboard