EmotiVoice How much data samples would I need to fine tune a new voice with a stable prompt ?

How much data samples would I need to fine tune a new voice with a stable prompt ?

Open JacopoMangiavacchi opened this issue 1 year ago • 5 comments

Thank you very much for sharing the receipt for fine tuning the LJSpeech dataset. I'm wondering if I can still train a new voice with a smaller dataset. With other model architecture I was able to clone a voice using something like 1 hour of training data. Could this be enough for EmotiVoice?

Thanks!

Jan 23 '24 21:01 JacopoMangiavacchi

Yes, I believe that one hour of training data should be sufficient for EmotiVoice's Voice Cloning.

Jan 24 '24 05:01 syq163

Thank you! I've been fine tuning a new voice but I'm having issue inferencing this voice. In the LJSpeech fine-tuning receipt on step 5, when calling python inference_am_vocoder_exp.py, the parameter --logdir is missing and I see this is a mandatory argument for the script. I'm confused about the value to pass here.

Jan 26 '24 19:01 JacopoMangiavacchi

It looks like I'm able to pass '.' to logdir for concatenating the right path but then again the script complains for a missing config.json file in the exp/LJspeech/tmp/ folder. I can't find this config.json file. What it should contains ?

Jan 26 '24 19:01 JacopoMangiavacchi

'logdir' is a required argument for 'inference_am_vocoder_joint.py', but it is not utilized in 'inference_am_vocoder_exp.py'.

Jan 29 '24 02:01 syq163

Thank you again @syq163, I was able to inference using the WangZeJun/simbert-base-chinese bert features. I see the script directly download these from HF repo. I only found the content and style subfolders in the exp/LJspeech/tmp/ folder.

Jan 29 '24 19:01 JacopoMangiavacchi

EmotiVoice EmotiVoice copied to clipboard

How much data samples would I need to fine tune a new voice with a stable prompt ?

EmotiVoice
EmotiVoice copied to clipboard