whisper-vits-svc
whisper-vits-svc copied to clipboard
whisper and hubert
Hi,
After I read the code, I found whisper encoder out is used as PPG, and hubert is used as Vec. I'm curious that the hubert here is discrete hubert after kmeans or hubert soft or just hubert hidden layer out? And what's the advantage on the mix of PPG and Vec?
Thanks~
Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.
Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.
Do you train a Chinese version HuBERT soft? Is there any reference?
https://github.com/fishaudio/chinese-hubert-soft
https://github.com/fishaudio/chinese-hubert-soft
OK, thanks, I'll try to train a chinese huerbt soft using more data.
Thanks for the question, I'm wonder what's will happen if I remove whisper ppg as input for I made a fake whisper ppg (like all zeros) will happened, do you try something like this before ?