NS2VC icon indicating copy to clipboard operation
NS2VC copied to clipboard

Issues with preserving the speaker identity

Open justinjohn0306 opened this issue 1 year ago • 3 comments

Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.

justinjohn0306 avatar Aug 03 '23 07:08 justinjohn0306

The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.

adelacvg avatar Aug 03 '23 08:08 adelacvg

alright, gotcha :)

justinjohn0306 avatar Aug 03 '23 10:08 justinjohn0306

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.

rishikksh20 avatar Apr 30 '24 08:04 rishikksh20