Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Default index ratio increased again in latest release - negatively impacting performance for little to no gain?

Open kalomaze opened this issue 1 year ago • 3 comments

image

I noticed that the default value seems to have changed again. I would like to know the reasoning because despite messing with the other option for voiceless consonant and crepe with median filter, it seems to still underperform compared to the old default value. 0.75 is probably a good default to stick with, or maybe a sweetspot of 0.7, considering how much the nearly maxed out index seems to cause raspy sounds, and how little the index really helps when it comes to accuracy (not much) QuickExample.zip

kalomaze avatar May 30 '23 20:05 kalomaze

This problem only exists when the training set is limited and the quality is average. The answer to this question depends on whether you care more about timbre similarity or sound quality.

RVC-Boss avatar May 31 '23 02:05 RVC-Boss

This model was trained on studio quality acapellas, using v2. No UVR was used whatsoever. On top of that, it was using ~30 minutes worth of data, all of the same studio quality. Unless the index is meant to be turned up only for very large model datasets (hours worth? It confuses me since '10 minutes' is the recommendation given on this repository for a quality model and this is over 3 times that, with no lossy AI isolation involved).

Quality in spek (spectral analysis tool): Quality in spek

0_gt_wavs.zip A sample of the preprocessed wavs before feature extraction to demonstrate how it sounds.

The model, when used at a proper index rate, is a very high quality model.

kalomaze avatar May 31 '23 20:05 kalomaze

I will use your gt and have a try

RVC-Boss avatar Jun 01 '23 16:06 RVC-Boss