TTS Voice Corruption on Longer Strings of text
Good Morning, I really want to use WaveRNN for a few personal projects such as reading some snippets of my morning news. So far it works great on small chunks of text (about the size of a tweet) but when a longer string of text is introduced, the voice goes nuts. Here is an example. https://drive.google.com/drive/folders/13zcgZ_2dS6mm_ZInejUcf2qI5kEytFH0?usp=sharing
The block of text I used was "While this works great on simple scripts, it tends to fail on longer blocks of text. I find that anything above 45 words is almost impossible for the tool to generate a clean recording. Any ideas on what is happening or do you think additional training may help?. I am at a loss for words but want to add more to bulk this up"
@KeithIMyers Yes, you;re right - the current model fails on very long sentences. I'll have to look into more robust attention mechanisms for that use case.