Idea for improvement: RMSEnergyExtractor and more...

Open Mixomo opened this issue 2 years ago • 0 comments

I have an idea for the future version: The possibility to train the model with the singer's expressivity by analyzing and extracting the RMS values. This would also allow to have a much more expressive and responsive model in the inference stage, according to the expressivity of the input audio.

To be honest, I can't give you concrete links, since I don't know much about it, but I know that it appears as "RMSEnergyExtractor" and it is a feature extractor. This implementation has been successfully achieved in other SVC implementations and has worked great.

The only source I can refer you to is: 1- https://github.com/fishaudio/fish-diffusion 2- https://github.com/fishaudio/fish-diffusion/blob/main/configs/svc_hifisinger_finetune.py #one of the configuration scripts where it mentions the RMSEnergyExtractor, plus a sort of data augmentation system that looks interesting.

The other idea was the possibility to save a .pth for the weights folder, each time a checkpoint is saved, instead of forcing the user to finish all the training to generate the .pth for the weights folder.

oh, and add voice conversion through DIO in the inference stage, since it is present in the preprocessing stage for training, but not for use in inference.

And I almost forgot, is there a possibility to use ContentVec in addition to Hubert?

Apr 25 '23 16:04 Mixomo