CycleGAN-VC2
CycleGAN-VC2 copied to clipboard
Voice Conversion by CycleGAN (语音克隆/语音转换): CycleGAN-VC2
CycleGAN-VC2-PyTorch
中文说明 | English
This code is a PyTorch implementation for paper: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, a nice work on Voice-Conversion/Voice Cloning.
- [x] Dataset
- [ ] VC
- [x] Chinese Male Speakers (S0913 from AISHELL-Speech & GaoXiaoSong: a Chinese star)
- [x] Usage
- [x] Training
- [x] Example
- [ ] Demo
- [x] Reference
Update
2020.11.17: fixed issues: re-implements the second step adverserial loss.
2020.08.27: add the second step adverserial loss by Jeffery-zhang-nfls
CycleGAN-VC2
Project Page
To advance the research on non-parallel VC, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (Patch GAN).
This repository contains:
- model code which implemented the paper.
- audio preprocessing script you can use to create cache for training data.
- training scripts to train the model.
- Examples of Voice Conversion - converted result after training.
Table of Contents
-
CycleGAN-VC2-PyTorch
- Update
-
CycleGAN-VC2
- Project Page
- Table of Contents
- Requirement
-
Usage
- preprocess
- train
- Pretrained
- Demo
- Star-History
- Reference
- Donation
- License
Requirement
pip install -r requirements.txt
Usage
preprocess
python preprocess_training.py
is short for
python preprocess_training.py --train_A_dir ./data/S0913/ --train_B_dir ./data/gaoxiaosong/ --cache_folder ./cache/
train
python train.py
is short for
python train.py --logf0s_normalization ./cache/logf0s_normalization.npz --mcep_normalization ./cache/mcep_normalization.npz --coded_sps_A_norm ./cache/coded_sps_A_norm.pickle --coded_sps_B_norm ./cache/coded_sps_B_norm.pickle --model_checkpoint ./model_checkpoint/ --resume_training_at ./model_checkpoint/_CycleGAN_CheckPoint --validation_A_dir ./data/S0913/ --output_A_dir ./converted_sound/S0913 --validation_B_dir ./data/gaoxiaosong/ --output_B_dir ./converted_sound/gaoxiaosong/
Pretrained
a pretrained model which converted between S0913 and GaoXiaoSong
download from Google Drive <735MB>
Demo
Samples:
reference speaker A: S0913(./data/S0913/BAC009S0913W0351.wav)
reference speaker B: GaoXiaoSong(./data/gaoxiaosong/gaoxiaosong_1.wav)
speaker A's speech changes to speaker B's voice: Converted from S0913 to GaoXiaoSong (./converted_sound/S0913/BAC009S0913W0351.wav)
Star-History
Reference
- CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Paper, Project
- Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks. Paper, Project
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. Paper, Project, Code
- Image-to-Image Translation with Conditional Adversarial Nets. Paper, Project, Code
Donation
If this project help you reduce time to develop, you can give me a cup of coffee :)
AliPay(支付宝)
data:image/s3,"s3://crabby-images/86e57/86e572bc81247874d930b874385365af8013c796" alt="ali_pay"
WechatPay(微信)
data:image/s3,"s3://crabby-images/d4397/d4397348642ffdb8cedc864d4edf7a544bcedaf9" alt="wechat_pay"
License
MIT © Kun