autovc
autovc copied to clipboard
How to reproduce the result on VCTK dataset?
I run make_spect.py and make_metadata.py to prepreocess the dataset (I used all speakers in VCTK). And then I used pretrained model of Speaker Encoder to extract speaker embedding and train the model. The final loss is about 0.03. Are there anyone reproduce the result successfully? Could you help me? Thanks!
After days of working on this project, I tried reproducing the results but all I get is silence, no voice.