Retrieval-based-Voice-Conversion-WebUI icon indicating copy to clipboard operation
Retrieval-based-Voice-Conversion-WebUI copied to clipboard

Standard for most of recorded music is 44khz instead of 40khz

Open kalomaze opened this issue 1 year ago • 2 comments

I've been told contentvec hubert was trained at 44khz, but there are 3 options in RVC for 40khz, 48khz, and 32khz. This confuses me, as the standard for most recorded music is 44khz, which means you must always pick 48khz to avoid this 10% quality loss, and yet it is not the default option. Ideally, there would be a 44khz pretrained weight replacing the 32khz one? So it would be a default of 44, with 40, and 48 as alternative options. It would be interesting too if a method would be implemented to scan the wavs for the 'true' khz based on the metadata and adjust training weights as necessary, during pre-processing.

kalomaze avatar May 05 '23 13:05 kalomaze

It looks like the change in the above pull request (which made 48khz the default) was rolled back after I showed off an example of a 40khz model beating a 48khz model trained on 44khz.. Very much so needs more testing to see which is concretely 'better', because maybe the default config for 48khz is suboptimal? There is also a chance 48khz audio works best with 48khz, but 44khz (most music and all UVR isolations) works better with 40khz because you're not exporting beyond the 'real frequency'.

kalomaze avatar May 10 '23 19:05 kalomaze

I guess the reason is that the pre-trained model of rvc is trained with the VCTK dataset with a sampling rate of 48khz.

yxlllc avatar May 12 '23 07:05 yxlllc

"contentvec hubert was trained at 44khz" but the data should be downsampled to 16khz first. So for the performance: 44khz is the same as 40khz or 48khz

RVC-Boss avatar May 29 '23 09:05 RVC-Boss