Real-Time-Voice-Cloning emotion voice cloning

emotion voice cloning

Open micelvrice opened this issue 1 year ago • 1 comments

I use the method mentioned in thie rep https://github.com/innnky/emotional-vits to try to implement emotion voice cloning, I finetuned the pretrained synthsizer on a small dataset that contains about 24 speakers, each with 100 audio, and these 100 pieces of audio are divided by into roughly five or four categories, with the same tetx in each category but with different emotions. I inference with the finetuned synthesizer and the pretrained encoder and vocoder, but it's not working very well, if anyone know what the problem is or how it should be trained?

Sep 18 '23 12:09 micelvrice

I am not sure about the quality either. If I use the samples provided, I can generate reasonably good speech. If I use my own (e.g., by recording it through the UI), I was not able to produce any valuable output.

Oct 18 '23 19:10 MassEast

Real-Time-Voice-Cloning Real-Time-Voice-Cloning copied to clipboard

emotion voice cloning

Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard