Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

[Question] What are the mute wavs used for?

Open Rolun opened this issue 2 years ago • 2 comments

Hi,

There are 2 mute wav files that get included in the training data. 2 questions:

  1. If I train on multiple speakers, should I include 2 of these per speaker, or is just the set of 2 in total enough?
  2. How do they benefit the model (conceptionally if nothing else)?

Many thanks in advance

Rolun avatar Jul 13 '23 12:07 Rolun

Due to forced slicing, small datasets may lack silent segments after forced slicing, resulting in the model not being able to learn how to handle silent segments during inference. Silent segments during inference may generate noise. Adding additional mute waves to the training data is to address this issue.

ms903x1 avatar Jul 13 '23 14:07 ms903x1

@ms903x1 - thanks! So by the sounds of it, there doesn't need to be mute wav files for each speaker (or at all for longer datasets), just a couple in total in the dataset so the NSF-GAN can learn what silence is?

Rolun avatar Jul 13 '23 18:07 Rolun

Yes.

RVC-Boss avatar Jul 16 '23 07:07 RVC-Boss