Retrieval-based-Voice-Conversion-WebUI [Question] What are the mute wavs used for?

[Question] What are the mute wavs used for?

Open Rolun opened this issue 2 years ago • 2 comments

Hi,

There are 2 mute wav files that get included in the training data. 2 questions:

If I train on multiple speakers, should I include 2 of these per speaker, or is just the set of 2 in total enough?
How do they benefit the model (conceptionally if nothing else)?

Many thanks in advance

Jul 13 '23 12:07 Rolun

Due to forced slicing, small datasets may lack silent segments after forced slicing, resulting in the model not being able to learn how to handle silent segments during inference. Silent segments during inference may generate noise. Adding additional mute waves to the training data is to address this issue.

Jul 13 '23 14:07 ms903x1

@ms903x1 - thanks! So by the sounds of it, there doesn't need to be mute wav files for each speaker (or at all for longer datasets), just a couple in total in the dataset so the NSF-GAN can learn what silence is?

Jul 13 '23 18:07 Rolun

Yes.

Jul 16 '23 07:07 RVC-Boss

Retrieval-based-Voice-Conversion-WebUI Retrieval-based-Voice-Conversion-WebUI copied to clipboard

[Question] What are the mute wavs used for?

Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard