training tricks?

Open sipie800 opened this issue 1 year ago • 0 comments

train with 20min high quality speech data. batchsize 8 for 40 epoch. with pitch guiding on. loss decreases from 32 to 25 around. The audio to audio result is far from fidelity to trained voice. It's much much worse than a simple oneshot result with VITS TTS. There is no overfitting, the result sound quality is just as nice as input audio with no fake distortions.

here are some questions if the rvc can possible be trained with any careful tuning:

What level of loss is supposed to be usually if the training converges ?
My data audio contains empty blanks. Though they take very small percent in 20min. The sliced audios may contain totally blank pieces, can that hurt the training to the degree that there is no any fidelity ?
What's the most often happening issue if the training of RVC gets under-fitting ? I don't think improving epoch is the answer. Can you provide some stardard rules for the training data quality?

Sep 26 '24 00:09 sipie800