Why does rvc can't handle emotional voice
Rvc can't properly clone crying,shooting , screaming like voices.., Is there any possible way to solve this issue
+1
+1 @yukiarimo ?
I’m also looking at this issue right now. I just figured out how to train from scratch, so I’ll let you know if it will help and if a “Good dataset is all you need.”
I’m also looking at this issue right now. I just figured out how to train from scratch, so I’ll let you know if it will help and if a “Good dataset is all you need.”
@yukiarimo do you mean training from scratch based on a specific language? If I’m correct, how can I train a specific language with emotions? Could you please share more details about it?
Yes, I’m training from scratch. To train the model to be able to express emotions, you would need:
- Probably create a new speaker in the base model. Or do training from scratch instead of fine-tuning (only if you have 2h+ data)
- When you record, then split the dataset into 3-20s chunks. You can try reading some light novel with emotions, and when doing something special, set “…” or “?” or “!” so the model will distinguish it from the usual monotonic sentence that ends with “.”
- I suggest you play around with the config. There are multiple parameters to change! Have fun!
- If you are unable to record a good dataset, hire a voice actress, as we did. Then, you can have a super high-quality dataset in 48kHz!
@yukiarimo What's the trick to train it from scratch? By that, you mean you're not using RVC's original checkpoints anymore, right? Also, what do you mean by setting "..." or "?" Afaik, this model is not conditioned on text
Just modify the code.
I have abandoned this project as I have trained raw VITS on two speakers at 48 kHz, and the conversion works perfectly with all emotions, prosody, and style in place.