Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Question about sample rates

Open kalomaze opened this issue 2 years ago • 1 comments

Considering 40khz sounds better than 48khz, is this because upsampling introduces duplicate data in the file that, algorithimically, makes the output voice result sound 'more robotic' despite being trained at a higher sample rate than 40khz? Or is it not related to upsampling, and its just harder for machine learning to replicate higher frequencies without noise, even if you give it 'real' 48khz?

I'm wondering if 44khz training would be better, but if 40khz was chosen because it helps outputs sound cleaner without losing much precision. Either that, or the hyperparameter setup for 48khz is suboptimal.

I also wonder if this means that, 32khz would sound more life-like, but obviously would be lower fidelity. Abandoning the option without proper extensive testing seems a bit overboard to me.

kalomaze avatar Jun 13 '23 09:06 kalomaze

Maybe 32k will reappearance in next versions. The reason of temporarily abandon 32k is that I find a better config for 32k and 48k, and the quality of v1 32k is not very good.

RVC-Boss avatar Jun 13 '23 14:06 RVC-Boss

0618 version now supports v2-32k and v2-48k.

RVC-Boss avatar Jun 18 '23 10:06 RVC-Boss