tacotron
tacotron copied to clipboard
using GAN to enhance the spectrograms
A post on googleblog says:
Most neural text-to-speech (TTS) systems produce over-smoothed spectrograms. When applied to the Tacotron TTS system, a GAN can recreate some of the realistic-texture, which reduces artifacts in the resulting audio.
https://research.googleblog.com/2017/12/tfgan-lightweight-library-for.html
Did you explore this any further? I am inclined towards going this way as well..
Any update on this?