gst-tacotron icon indicating copy to clipboard operation
gst-tacotron copied to clipboard

Mumbling in synthesis

Open a-froghyar opened this issue 4 years ago • 1 comments

Hey, thanks for the implementation @syang1993!

I'm using this code to implement another paper and I've bumped into some issues during synthesis. I'm getting good alignment on training and the interim synthesised results sound good, however during evaluation, the synthesis is very unpredictable and sometimes fails to synthesise understandable speech. It rather sounds like mumbling. It's not only on long utterances, but sometimes on short and mid-length texts too. I'm attaching a few alignment plots and audio examples.

I was wondering if you've come across this before and if you have any tips where I should look to fix this issue? I've trained the model using the multihead attention, do you reckon the GMM attention will improve a lot? eval-320000_ref-frankenstein_chp_13-4-align eval-320000_ref-frankenstein_chp_13-3-2-align mumbling_samples.zip

a-froghyar avatar Jan 11 '21 16:01 a-froghyar

This paper attempts to address this https://arxiv.org/abs/1910.10288 https://google.github.io/tacotron/publications/location_relative_attention/index.html

There appears to be a PyTorch implementation https://github.com/bshall/Tacotron

EFHIII avatar Feb 15 '21 08:02 EFHIII