Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
What are the differences between the two versions of pretrained models?
Hi, I wonder what are the differences between v2 and v1 pretrained models. Maybe I missed something but I didn't find much details about the pretrained models in documentation.
They use the same training dataset (VCTK) right? So what are the improvements/adjustments between the two versions, apart from the additional support of 32kHz sample rate?
Thanks!
The v2 version model changes the input from the 256 dimensional feature of 9-layer Hubert+final_proj to the 768 dimensional feature of 12-layer Hubert, and has added 3 period discriminators. The training data is the same as v1.
Where can I get the v2 version of hubert_base.onnx [256] ?