SpeechSplit Question: Using a pretrained encoder for getting the speaker embedding.

Question: Using a pretrained encoder for getting the speaker embedding.

Open nischal-sanil opened this issue 4 years ago • 3 comments

Hi,

Did you guys experiment using a pretrained encoder for getting the speaker embedding similar to your previous work (AutoVC).

PS: Amazing work by the way!

Thanks,

Jan 19 '21 07:01 nischal-sanil

@nischal-sanil did you make it work?

can you check my question please? https://github.com/auspicious3000/SpeechSplit/issues/28

Jan 23 '21 08:01 FurkanGozukara

I have the same question @auspicious3000 Here you use the one-hot encoded embedding with a lent of 82 (the number of speakers it was pretrained), but could you generate a zeros-shot general embedding like in AutoVC. If I am correct the size of the used embedding was larger in that, I assume you cannot use that here.

So to wrap up: this method with the pretrained weights works only on the 82 speakers it was trained and conditioned on if we consider only the timbre conversion?

Jun 04 '21 17:06 terbed

@terbed Yes. Unless you retrain the model.

Jun 04 '21 17:06 auspicious3000

SpeechSplit SpeechSplit copied to clipboard

Question: Using a pretrained encoder for getting the speaker embedding.

SpeechSplit
SpeechSplit copied to clipboard