awesome-tts-samples icon indicating copy to clipboard operation
awesome-tts-samples copied to clipboard

Awesome list of TTS papers with audio samples

awesome-tts-samples

List of TTS papers with audio samples provided by the authors. The last rows of each paper show the spectrogram inversion (vocoder) being used.

For more comprehensive list of important TTS papers, I recommmend reading xcmyz/speech-synthesis-paper written by Zhengxi Liu.

2020

  • FastPitch - FastPitch: Parallel Text-to-speech with Pitch Prediction
    • https://fastpitch.github.io/
    • WaveGlow
  • EATS - End-to-End Adversarial Text-to-Speech
    • https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech
    • End-to-end model
  • Glow-TTS - Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
    • https://jaywalnut310.github.io/glow-tts-demo
    • WaveGlow
  • Flowtron - Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
    • https://nv-adlr.github.io/Flowtron
    • WaveGlow

2019

  • Tacotron2+DCA - Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
    • https://google.github.io/tacotron/publications/location_relative_attention
    • WaveRNN
  • GAN-TTS - High Fidelity Speech Synthesis with Adversarial Networks
    • https://storage.googleapis.com/deepmind-media/research/abstract.wav
    • End-to-end model (Built on top of 200Hz linguistic & log pitch features)
  • Multi-lingual Tacotron2 - Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
    • https://google.github.io/tacotron/publications/multilingual
    • WaveRNN
  • MelNet - MelNet: A Generative Model for Audio in the Frequency Domain
  • FastSpeech - FastSpeech: Fast, Robust and Controllable Text to Speech
    • https://speechresearch.github.io/fastspeech
    • WaveGlow
  • ParaNet - Parallel Neural Text-to-Speech
    • https://parallel-neural-tts-demo.github.io
    • WaveVAE, ClariNet, WaveNet

2018

  • Transformer-TTS - Neural Speech Synthesis with Transformer Network
    • https://neuraltts.github.io/transformertts
    • WaveNet
  • Multi-speaker Tacotron2 - Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
    • https://google.github.io/tacotron/publications/speaker_adaptation
    • WaveNet
  • Tacotron2+GST - Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
    • https://google.github.io/tacotron/publications/global_style_tokens
    • Griffin-Lim

2017

  • Tacotron2 - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
    • https://google.github.io/tacotron/publications/tacotron2
    • WaveNet
  • Tacotron - Tacotron: Towards End-to-End Speech Synthesis
    • https://google.github.io/tacotron/publications/tacotron
    • Griffin-Lim

Contributing

TODO