parler-tts How big a dataset is needed to train the model?

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

Jul 09 '24 03:07 zyy-fc

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

Jul 22 '24 04:07 ScottishFold007

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct. Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

bro, have you ever trained the model from scratch? Could you please tell me the final train loss and eval loss? I have trained on a 600 hours dataset and got a loss at 4.1, of course the model can't be able to produce any useful speech... Thanks very much.

Sep 21 '24 10:09 gantuo