parler-tts icon indicating copy to clipboard operation
parler-tts copied to clipboard

How big a dataset is needed to train the model?

Open zyy-fc opened this issue 1 year ago • 2 comments

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

zyy-fc avatar Jul 09 '24 03:07 zyy-fc

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct.

Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

ScottishFold007 avatar Jul 22 '24 04:07 ScottishFold007

I used 560+ hours of libritts_R data to train the model (187M) from scratch, but the audio synthesized by the model is not correct. Is this because the size od the dataset is not enough?

People have tens of thousands of hours of training data, but you have less than 600 hours of audio data and you want to produce excellent results? That's just nonsense, isn't it?

bro, have you ever trained the model from scratch? Could you please tell me the final train loss and eval loss? I have trained on a 600 hours dataset and got a loss at 4.1, of course the model can't be able to produce any useful speech... Thanks very much.

gantuo avatar Sep 21 '24 10:09 gantuo