Retrieval-based-Voice-Conversion-WebUI
Retrieval-based-Voice-Conversion-WebUI copied to clipboard
[Suggestion] Extend the pitch range of RVC models
Is your feature request related to a problem? Please describe. Currently, RVC models with pitch guidance seem to have an f0 range from 50 Hz to 1.1 kHz. When I feed an audio sample outside of this range, it produces distorted breath sounds or subharmonics of the fundamental.
Describe the solution you'd like I would like to see this extended since humans are more than capable of going higher and lower than that (vocal fry, women's screams, and whistles are some examples). RMVPE's range appears to be set from 30 Hz to 8 kHz.
Describe alternatives you've considered
I have tried changing f0_min
and f0_max
to something else to test in real-time inference, but it has no effect. RMVPE does react to this a little, but it's not clear the changes are obvious.
Additional context I recommend assigning the lower bound to 20 Hz (the minimum range of human hearing) and the upper bound to roughly 8 kHz like RMVPE to cover the entire vocal range and beyond. Better yet, consider allowing the user to fully customize this in training and inference.