Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion
Awesome-Singing-Voice-Synthesis-and-Singing-Voice-Conversion copied to clipboard
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting wo...
Awesome Singing Voice Synthesis and Singing Voice Conversion
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting works (such as Music Synthesis, Automatic Music Transcription, Automatic MOS Prediction, SSL-based ASR, ...etc).
Welcome to PR or contact me via email ([email protected]) for updating papers and works.
Paper List
Journals
IEEE/ACM TASLP, IEEE JSTSP, JSLHR, IEEE TPAMI
Conferences
NeuraIPS, ICLR, ICML, IJAI, AAAI, ACL, NAACL, EMNLP, ISMIR, ICASSP, INTERSPEECH, ACM MM, ICME
Workshops
ASRU, SLT
Singing Voice Conversion (Other Key Words: SVC, Singing Style Transfer)
-
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals | ICASSP 2022 | 🎧Demo
-
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher | INTERSPEECH 2022 | ✔️Code | 🎧Demo
-
A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Controllable and Interpretable Singing Voice Decomposition via Assem-VC | NeurIPS 2021 Workshop | 🎧Demo
-
DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion | ASRU 2021 | 🎧Demo
-
FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation | ICME 2021 | 🎧Demo
-
Unsupervised WaveNet-based Singing Voice Conversion Using Pitch Augmentation and Two-phase Approach | 2021 | ✔️Code | 🎧Demo
-
Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding | 2021 | 🎧Demo
-
Zero-shot Singing Voice Conversion | ISMIR 2020 | 🎧Demo
-
PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network | ICASSP 2020 | 🎧Demo
-
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System | INTERSPEECH 2020 | 🎧Demo
-
Unsupervised Cross-Domain Singing Voice Conversion | INTERSPEECH 2020 | 🎧Demo
-
VAW-GAN for Singing Voice Conversion with Non-parallel Training Data | APSIPA 2020 | ✔️Code | 🎧Demo
-
Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training | 2020 | 🎧Demo | Unofficial Code
Dateset
-
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | NeurIPS 2022 | 🔽Apply&Download | 🎧Demo
-
NHSS: A Speech and Singing Parallel Database | 🔽Apply&Download
Singing Technique Conversion
- Zero-shot Singing Technique Conversion | CMMR 2021
Voice Conversion (Other Key Words: VC, Voice Cloning, Voice Style Transfer)
-
End-to-End Zero-Shot Voice Style Transfer with Location-Variable Convolutions | 2022 | 🎧Demo
-
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion | IEEE JSTSP 2022
-
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme | ICLR 2022 | ✔️Code | 🎧Demo
-
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone | ICML 2022 | ✔️Code | 🎧Demo | 🎧Demo| 📝Blog
-
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations | ICASSP 2022 | ✔️Code
-
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
-
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques | ICASSP 2022 | ✔️Code | 🎧Demo
-
NVC-Net: End-to-End Adversarial Voice Conversion | ICASSP 2022 | ✔️Code | 🎧Demo
-
Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion | ICASSP 2022 | 🎧Demo
-
Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features | ICASSP 2022 | 🎧Demo
-
Toward Degradation-Robust Voice Conversion | ICASSP 2022
-
DGC-vector: A new speaker embedding for zero-shot voice conversion | ICASSP 2022 | 🎧Demo
-
Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers | INTERSPEECH 2022 | 🎧Demo
-
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling | IEEE/ACM TASLP 2021 | ✔️Code | 🎧Demo
-
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations | NeurIPS 2021 | 🎧Demo | Unofficial Code
-
Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning | ICLR 2021
-
Global Rhythm Style Transfer Without Text Transcriptions | ICML 2021 | ✔️Code
-
AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization | ICASSP 2021 | ✔️Code | 🎧Demo
-
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion | INTERSPEECH 2021 Best Paper Award | ✔️Code | 🎧Demo
-
S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations | INTERSPEECH 2021 | 🎧Demo
-
On Prosody Modeling for ASR+TTS based Voice Conversion | ASRU 2021 | 🎧Demo
-
MediumVC: Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features | 2021 | ✔️Code | 🎧Demo
-
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning | IEEE/ACM TASLP 2020
-
Unsupervised Speech Decomposition via Triple Information Bottleneck | ICML 2020 | ✔️Code
-
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss | ICML 2019 | ✔️Code | 🎧Demo
-
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization | INTERSPEECH 2019 | ✔️Code
Dateset
- CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | 2019 | 🔽Apply&Download |
Emotional Voice Conversion
-
Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion | INTERSPEECH 2022 | 🎧Demo
-
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis | INTERSPEECH 2022 | 🎧Demo
-
Emotion Intensity and its Control for Emotional Voice Conversion | IEEE Transactions on Affective Computing | ✔️Code | 🎧Demo
-
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | INTERSPEECH 2021 | ✔️Code | 🎧Demo
-
Textless Speech Emotion Conversion using Discrete and Decomposed Representations | 2021 | 🎧Demo
-
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion | INTERSPEECH 2020 | ✔️Code | 🎧Demo
-
Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data | Odyssey 2020 | ✔️Code | 🎧Demo
Dateset
- Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset | ICASSP 2021 | 🔽Apply&Download | 🎧Demo
Singing Voice Synthesis (Other Key Words: SVS)
-
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training | 2022 | 🎧Demo
-
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism | AAAI 2022 | ✔️Code | 🎧Demo
-
Learning the Beauty in Songs: Neural Singing Voice Beautifier | ACL 2022 | ✔️Code | 🎧Demo
-
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis | INTERSPEECH 2022 | ✔️Code
-
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy | INTERSPEECH 2022 | ✔️Code
-
WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses | INTERSPEECH 2022 | 🎧Demo
-
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System | IEEE/ACM TASLP 2021 | ✔️Code
-
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis | 2020 | 🎧Demo
Dateset
-
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus | NeurIPS 2022 | 🔽Apply&Download | 🎧Demo
-
PopCS | AAAI 2022 | 🔽Apply&Download
-
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis | INTERSPEECH 2022 | 🔽Apply&Download
High-Quality Speech Synthesis (Other Key Words: Text-to-Speech, TTS)
-
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis | ICLR 2022 | ✔️Code | 🎧Demo
-
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis | IJCAI 2022 | ✔️Code | 🎧Demo
Vocoder
-
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus | ACM MM 2021 | 🔽Apply&Download | ✔️Code | 🎧Demo
-
Towards achieving robust universal neural vocoding | INTERSPEECH 2019 | ✔️Code | 🎧Demo | Unofficial Code
Music Synthesis/Music Synthesis
-
Multi-instrument Music Synthesis with Spectrogram Diffusion | ISMIR 2022 | ✔️Code | 🎧Demo
-
Musika! Fast Infinite Waveform Music Generation | ISMIR 2022 | ✔️Code | 🎧Demo
Automatic Music Transcription
-
MT3: Multi-Task Multitrack Music Transcription | ICLR 2022 | ✔️Code |
-
Omnizart: A General Toolbox for Automatic Music Transcription | The Open Journal 2021 | ✔️Code | 🎧Demo
Self-supervised/Unsupervised ASR
-
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing | IEEE JSTSP 2022 | ✔️Code | ✔️Code
-
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training | ICASSP 2022 | ✔️Code | ✔️Code
-
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition | ICASSP 2022 | ✔️Code | ✔️Code
-
Pseudo-Labeling for Massively Multilingual Speech Recognition | ICASSP 2022 | ✔️Code | ✔️Code
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units | IEEE/ACM TASLP 2021 | ✔️Code | ✔️Code
-
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data | ICML 2021 | ✔️Code | ✔️Code | ✔️Code
-
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale | 2021 | ✔️Code | ✔️Code
-
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition | 2021 | ✔️Code | ✔️Code
-
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech | IEEE/ACM TASLP 2020 | ✔️Code
-
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | NeurIPS 2020 | ✔️Code | ✔️Code
-
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations | ICLR 2020 | ✔️Code | ✔️Code
-
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders | ICASSP 2020 | ✔️Code
-
fairseq S2T: Fast Speech-to-Text Modeling with fairseq | AACL 2020 | ✔️Code | ✔️Code
-
Unsupervised Cross-lingual Representation Learning for Speech Recognition | 2020 | ✔️Code | ✔️Code
-
Representation Learning with Contrastive Predictive Coding | 2019 | ✔️Code
Automatic MOS Prediction
-
The VoiceMOS Challenge 2022 | INTERSPEECH 2022
-
Utilizing Self-supervised Representations for MOS Prediction | INTERSPEECH 2021 | ✔️Code
Speech Data Augmentation
- Data Augmenting Contrastive Learning of Speech Representations in the Time Domain | SLT 2021 | ✔️Code
Speech Insertion
- RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion | INTERSPEECH 2022 | 🎧Demo
Prosody-Aware
Adversarial Attack
Toolkits
ASR Toolkits
TTS Toolkits
Music Processing Toolkits
Data Annotation/Alignment/ Toolkits
- Praat: doing phonetics by computer
- Parselmouth - Praat in Python, the Pythonic way
- Montreal Forced Aligner
Other Frameworks and Toolkits
Competitions
References
- Awesome Speech Recognition Speech Synthesis Papers
- Awesome Voice Conversion Papers Projects
- TTS Papers
- 🐸 TTS papers
- Speech Synthesis Paper
- Papers With Code: Voice Conversion
- Papers With Code: Singing Voice Conversion
- Papers With Code: Singing Voice Synthesis
- Awesome Open Source: Voice Conversion
- ICASSP 2021 Paper List-VC