GST-Tacotron
GST-Tacotron copied to clipboard
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
GST-Tacotron-Pytorch
A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Update
Add support for blizzard dataset.
Requirements
pip3 install -r requirements.txt
File structure
-
Hyperparameters.py
--- hyperparameters -
Network.py
--- encoder and decoder -
Modules.py
--- some modules for tacotron -
Loss.py
--- loss function -
Data.py
--- dataset loader -
utils.py
--- some util functions for data I/O -
Synthesis.py
--- speech generation
How to train
- Download a multispeaker dataset
- Preprocess your data and implement your
get_XX_data
function inData.py
- Set hyperparameters in
Hyperparameters.py
- Make a directory named
log
as follow:
--- log
| |
| --- log[log_number]
|
--- code
|
--- Tacotron
|
--- train.py
|
--- Network.py
|
......
- Run train.py
python3 train.py [log_number] [dataset_size] [start_epoch]
[log_number]: the log directory number
[dataset_size]: int or all
[start_epoch]: which epoch start to train (0 if start from scratch )
for example:
python3 train.py 0 all 0
How to generate wav
Rungenerate.py
. Replace the text
in generate.py
with any chinese sentences as you like before running
The pretained model provided is trained on Chinese dataset, so it only supports chinese now.