Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Regarding inputted speaker content

Open suzhenghang opened this issue 1 year ago • 0 comments

Because retrieval features are used during inference to replace input features in order to prevent speaker identity leakage, but how can we ensure that the generated speech still corresponds to the original input content in terms of the speaker's voice?

feats = ( torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats )

suzhenghang avatar Jul 10 '23 14:07 suzhenghang