Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
Regarding inputted speaker content
Because retrieval features are used during inference to replace input features in order to prevent speaker identity leakage, but how can we ensure that the generated speech still corresponds to the original input content in terms of the speaker's voice?
feats = ( torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats )