NS2VC
NS2VC copied to clipboard
Issues with preserving the speaker identity
Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.
The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.
alright, gotcha :)
@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.