metavoice-src
metavoice-src copied to clipboard
Finetune success for accented English?
Has anyone had success fine-tuning on a non-US/UK accent, and if so could you share your training config? I fine-tuned 50 epochs with ~30 minutes good quality, Australian English but didn't see much improvement from the zero-shot result.
I had seen "We have had success with as little as 1 minute training data for Indian speakers." and interpreted that as success with Indian-accented English, but am now realizing that maybe wasn't what the authors meant.
Hey @kelseyjd, sorry for only getting to this now, it was indeed referring to Indian-accented English. Could you share more of how you've assembled your dataset, your training configuration, your reference clip and what training dynamics look like?