Real-Time Voice Cloning

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my thesis if you're curious, or if you're looking for info I haven't documented yet. Mostly I would recommend giving a quick look to the figures beyond the introduction.

SV2TTS is a three-stage deep learning framework that allows the creation of a numerical representation of a voice from a few seconds of audio, then use that data to condition a text-to-speech model trained to generate new voices.

Video demonstration (click the play button):

Papers implemented

URL	Designation	Title	Implementation source
1806.04558	SV2TTS	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	This repo
1802.08435	WaveRNN (vocoder)	Efficient Neural Audio Synthesis	fatchord/WaveRNN
1712.05884	Tacotron 2 (synthesizer)	Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions	Rayhane-mamah/Tacotron-2
1710.10467	GE2E (encoder)	Generalized End-To-End Loss for Speaker Verification	This repo

Get Started

Requirements

Please use the setup.sh or setup.bat if you're on linux and windows respectively to install the dependancies, and requirements. Currently only python 3.7.x is supported.

Windows Install Requirements
- During python installation, make sure python is added to path during installation.
- During conda installation, make sure you install it 'just for me'.
- During ms build tools installation, you only need to install the c++ package, which requires around 4.7GB. Upon installation of build tools, you'll need to restart the computer to complete the install process. Rerun the setup.bat to finish the setup process.

Install Manually:

You will need PyTorch (>=1.0.1) installed first, then run pip install -r requirements.txt to install the necessary packages.

After install Steps

Next you will need pretrained models if you don't plan to train your own. These models were trained on a cuda device, so they'll produce finicky results for a cpu. New CPU models will need to be produced first. (As of 5/1/20) Download the models, and uncompress them in this root folder. If done correctly, it should result as /encoder/saved_models, /synthesizer/saved_models, and /vocoder/saved_models.

Test installation

When you believe you have all the neccesary soup, test the program by running python demo_cli.py. If all tests pass, you're good to go. To use the cpu, use the option --cpu.

Generate Audio from dataset

There are a few preconfigured options for datasets. One in perticular, LibriSpeech/train-clean-100 is made to work from demo_toolbox.py. When you download this dataset, you can locate the directory anywhere, but creating a folder in this directory named datasets is recommended. (All scripts will use this directory as default)

To run the toolbox, use python demo_toolbox.py if you followed the recommendation for the datasets directory location. Otherwise, include the full path to the dataset and use the option -d.

To set the speaker, you'll need an input audio file. use browse in the toolbox to your personal audio file, or record to set your own voice.

The toolbox supports other datasets, including dev-train.

If you are running an X-server or if you have the error Aborted (core dumped), see this issue.

Contributions & Issues

Original Author CorentinJ News

13/11/19: I'm sorry that I can't maintain this repo as much as I wish I could. I'm working full time as of June 2019 on improving voice cloning techniques and I don't have the time to share my improvements here. Plus this repo relies on a lot of old tensorflow code and it's hard to work with. If you're a researcher, then this repo might be of use to you. If you just want to clone your voice, do check our demo on Resemble.AI - it will give much better results than this repo and will not require a complex setup.

20/08/19: I'm working on resemblyzer, an independent package for the voice encoder. You can use your trained encoder models from this repo with it.

06/07/19: Need to run within a docker container on a remote server? See here.

25/06/19: Experimental support for low-memory GPUs (~2gb) added for the synthesizer. Pass --low_mem to demo_cli.py or demo_toolbox.py to enable it. It adds a big overhead, so it's not recommended if you have enough VRAM.

Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard

Metadata

Real-Time Voice Cloning

Papers implemented

Get Started

Requirements

Install Manually:

After install Steps

Test installation

Generate Audio from dataset

Contributions & Issues

Original Author CorentinJ News

← Metadata

Owner

Metadata

Real-Time-Voice-Cloning Real-Time-Voice-Cloning copied to clipboard

Metadata

Real-Time Voice Cloning

Papers implemented

Get Started

Requirements

Install Manually:

After install Steps

Test installation

Generate Audio from dataset

Contributions & Issues

Original Author CorentinJ News

← Metadata

Owner

Metadata

Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard