awesome-tts-samples

List of TTS papers with audio samples provided by the authors. The last rows of each paper show the spectrogram inversion (vocoder) being used.

For more comprehensive list of important TTS papers, I recommmend reading xcmyz/speech-synthesis-paper written by Zhengxi Liu.

2020

FastPitch - FastPitch: Parallel Text-to-speech with Pitch Prediction
- https://fastpitch.github.io/
- WaveGlow
EATS - End-to-End Adversarial Text-to-Speech
- https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech
- End-to-end model
Glow-TTS - Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
- https://jaywalnut310.github.io/glow-tts-demo
- WaveGlow
Flowtron - Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
- https://nv-adlr.github.io/Flowtron
- WaveGlow

Tacotron2+DCA - Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
- https://google.github.io/tacotron/publications/location_relative_attention
- WaveRNN
GAN-TTS - High Fidelity Speech Synthesis with Adversarial Networks
- https://storage.googleapis.com/deepmind-media/research/abstract.wav
- End-to-end model (Built on top of 200Hz linguistic & log pitch features)
Multi-lingual Tacotron2 - Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
- https://google.github.io/tacotron/publications/multilingual
- WaveRNN
MelNet - MelNet: A Generative Model for Audio in the Frequency Domain
- https://audio-samples.github.io
- https://sjvasquez.github.io/blog/melnet
- Gradient-based spectrogram inversion
FastSpeech - FastSpeech: Fast, Robust and Controllable Text to Speech
- https://speechresearch.github.io/fastspeech
- WaveGlow
ParaNet - Parallel Neural Text-to-Speech
- https://parallel-neural-tts-demo.github.io
- WaveVAE, ClariNet, WaveNet

Transformer-TTS - Neural Speech Synthesis with Transformer Network
- https://neuraltts.github.io/transformertts
- WaveNet
Multi-speaker Tacotron2 - Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
- https://google.github.io/tacotron/publications/speaker_adaptation
- WaveNet
Tacotron2+GST - Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
- https://google.github.io/tacotron/publications/global_style_tokens
- Griffin-Lim

Tacotron2 - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
- https://google.github.io/tacotron/publications/tacotron2
- WaveNet
Tacotron - Tacotron: Towards End-to-End Speech Synthesis
- https://google.github.io/tacotron/publications/tacotron
- Griffin-Lim

TODO