metavoice-src What is required audio length for fine tuning?

What is required audio length for fine tuning?

Open risqaliyevds opened this issue 1 year ago • 4 comments

trafficstars

I split my audio into 5-10 second chunks. Is this normal for fine-tuning, or is there a specific range for audio chunks? I fine-tuned with my Uzbek language audio (approximately 30 hours, and my loss is not decreasing

Mar 28 '24 07:03 risqaliyevds

Hey! 5-10 seconds should be enough, but note that during synthesis you'll struggle to generate more than 5-10 seconds at one time due to this...

hard to debug loss not decreasing without more info!

Mar 30 '24 15:03 vatsalaggarwal

Hey @risqaliyevds, let us know if you have anymore info or we'll look to close this issue in the next few days.

Apr 03 '24 09:04 lucapericlp

I met similar problems. Both training loss and val loss is not decreasing.

May 01 '24 22:05 eshoyuan

Could both of you provide more information w.r.t your finetuning configurations & dataset that you're using? As @vatsalaggarwal mentioned, 5-10s should be fine if thats appropriate at inference time. Are either of you able to get a finetuning working with a non-custom dataset (i.e LibriTTS, VCTK)?

May 14 '24 21:05 lucapericlp

metavoice-src metavoice-src copied to clipboard

What is required audio length for fine tuning?

metavoice-src
metavoice-src copied to clipboard