StyleSpeech icon indicating copy to clipboard operation
StyleSpeech copied to clipboard

How to improve the synthesized results?

Open sanjeevani279 opened this issue 3 years ago • 4 comments

I have trained the model for 200k steps, and still, the synthesised results are extremely bad. The sampling rate I have used is 22050 Hz and the batch size used is 16.
loss_curve This is how my loss curve looks after 200k steps. Can you help me with what can I do now to improve my synthesized audio results?

sanjeevani279 avatar May 18 '22 09:05 sanjeevani279

@sanjeevani279 I have same problem with 22050 hz, while 1600hz is ok. Did you resolve this problem ?

chazo1994 avatar Jun 07 '22 09:06 chazo1994

@chazo1994 Are you using 1600Hz and the batchsize is 16? How is the synthesis effect?

Summerxu86 avatar Jun 11 '22 07:06 Summerxu86

@chazo1994 Are you using 1600Hz and the batchsize is 16? How is the synthesis effect?

@Summerxu86 I train two model 22050khz and 16khz, both use batchsize 48. Model 16k is faster convergence, and the synthesized audio at 200k step of model 16khz is much better than model 22k.

chazo1994 avatar Jun 23 '22 04:06 chazo1994

@chazo1994 Are you using 1600Hz and the batchsize is 16? How is the synthesis effect?

@Summerxu86 I train two model 22050khz and 16khz, both use batchsize 48. Model 16k is faster convergence, and the synthesized audio at 200k step of model 16khz is much better than model 22k.

Do you use the US or EU Libritts dataset? I trained a US model, and the performance is still a bit different from the original model. Aligned with an MFA acoustic model with apra2.0.0a?

zxixi-1 avatar Apr 20 '24 19:04 zxixi-1