WaveRNN icon indicating copy to clipboard operation
WaveRNN copied to clipboard

TTS Voice Corruption on Longer Strings of text

Open KeithIMyers opened this issue 6 years ago • 1 comments

Good Morning, I really want to use WaveRNN for a few personal projects such as reading some snippets of my morning news. So far it works great on small chunks of text (about the size of a tweet) but when a longer string of text is introduced, the voice goes nuts. Here is an example. https://drive.google.com/drive/folders/13zcgZ_2dS6mm_ZInejUcf2qI5kEytFH0?usp=sharing

The block of text I used was "While this works great on simple scripts, it tends to fail on longer blocks of text. I find that anything above 45 words is almost impossible for the tool to generate a clean recording. Any ideas on what is happening or do you think additional training may help?. I am at a loss for words but want to add more to bulk this up"

KeithIMyers avatar Aug 07 '19 20:08 KeithIMyers

@KeithIMyers Yes, you;re right - the current model fails on very long sentences. I'll have to look into more robust attention mechanisms for that use case.

fatchord avatar Aug 14 '19 16:08 fatchord