Sovits
Sovits copied to clipboard
An implementation of the combination of Soft-VC and VITS
Stella VC Based on Soft-VC and VITS
This project is closed...
Contents
- Update
- Introduction
-
Models
- A Certain Magical Index
- Shiki Natsume
- Shiki Natsume 2.0
- How to use
- TODO
- Contact
- Acknowledgement
- References
Update
- Sovits 2.0 inference demo is available!
Introduction
Inspired by Rcell, I replaced the word embedding of TextEncoder
in VITS with the output of the ContentEncoder
used in Soft-VC to achieve any-to-one voice conversion with non-parallel data. Of course, any-to-many voice converison is also doable!
For better voice quality, in Sovits2, I utilize the f0 model used in StarGANv2-VC to get fundamental frequency feature of an input audio and feed it to the vocoder of VITS.
Models
A Certain Magical Index
- Description
Speaker | ID |
---|---|
一方通行 | 0 |
上条当麻 | 1 |
御坂美琴 | 2 |
白井黑子 | 3 |
-
Model: Google drive
-
Config: in this repository
-
Demo
- Colab: Sovits (魔法禁书目录)
- BILIBILI: 基于Sovits的4人声音转换模型
Shiki Natsume
- Description
Single speaker model of Shiki Natsume.
-
Model: Google drive
-
Config: in this repository
-
Demo
- Colab: Sovits (四季夏目)
- BILIBILI: 枣子姐变声器
Shiki Natsume 2.0
- Description
Single speaker model of Shiki Natsume, trained with F0 feature.
-
Model: Google drive
-
Config: in this repository
-
Demo
- Colab: Sovits2 (四季夏目)
How to use
Train
Prepare dataset
Audio should be wav
file, with mono channel and a sampling rate of 22050 Hz.
Your dataset should be like:
└───wavs
├───dev
│ ├───LJ001-0001.wav
│ ├───...
│ └───LJ050-0278.wav
└───train
├───LJ002-0332.wav
├───...
└───LJ047-0007.wav
Extract speech units
Utilize the content encoder to extract speech units in the audio.
For more information, refer to this repo.
cd hubert
python3 encode.py soft path/to/wavs/directory path/to/soft/directory --extension .wav
Then you need to generate filelists for both your training and validation files. It's recommended that you prepare your filelists beforehand!
Your filelists should look like:
Single speaker:
path/to/wav|path/to/unit
...
Multi-speaker:
path/to/wav|id|path/to/unit
...
Train Sovits
Single speaker:
python train.py -c configs/config.json -m model_name
Multi-speaker:
python train_ms.py -c configs/config.json -m model_name
You may also refer to train.ipynb
Inference
Please refer to inference.ipynb
TOD0
- [x] Add F0 model
- [ ] Add F0 loss
Contact
QQ: 2235306122
BILIBILI: Francis-Komizu
Ackowledgement
Special thanks to Rcell for giving me both inspiration and advice!