whisper-vits-svc icon indicating copy to clipboard operation
whisper-vits-svc copied to clipboard

whisper and hubert

Open wblgers opened this issue 1 year ago • 5 comments

Hi,

After I read the code, I found whisper encoder out is used as PPG, and hubert is used as Vec. I'm curious that the hubert here is discrete hubert after kmeans or hubert soft or just hubert hidden layer out? And what's the advantage on the mix of PPG and Vec?

Thanks~

wblgers avatar Oct 12 '23 03:10 wblgers

Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.

MaxMax2016 avatar Oct 12 '23 03:10 MaxMax2016

Use whisper in order to pronounce each word clearly, and Use HuBERT soft to make up for pronunciation details.

Do you train a Chinese version HuBERT soft? Is there any reference?

wblgers avatar Oct 12 '23 03:10 wblgers

https://github.com/fishaudio/chinese-hubert-soft

MaxMax2016 avatar Oct 12 '23 03:10 MaxMax2016

https://github.com/fishaudio/chinese-hubert-soft

OK, thanks, I'll try to train a chinese huerbt soft using more data.

wblgers avatar Oct 12 '23 08:10 wblgers

Thanks for the question, I'm wonder what's will happen if I remove whisper ppg as input for I made a fake whisper ppg (like all zeros) will happened, do you try something like this before ?

panxin801 avatar Jun 19 '24 08:06 panxin801