FG-transformer-TTS icon indicating copy to clipboard operation
FG-transformer-TTS copied to clipboard

What should we do to adjust the model on other language

Open JohnHerry opened this issue 2 years ago • 0 comments

Hi, We are trying the single speaker instance. We had tried to train the model on LJSpeech, It seems the Local Style reference audio in deed effectively affect the prosody of the synthesized speech, but when we use BZNSYP, a Mandarin dataset, the result model have no ability to transfer speak style from reference audio to the synthesised one. and, the "model.synthesize_with_sample()" who use random data as LST, will just product chaos sound of the speaker, I am not sure is it because that the model's LST had speech content leakage in it. Then how to adjust the model parameter to be used in another language? By the way. We are using the wav2vec2-LARGE instead of wav2vec2-BASE, and emo-dim = 1024

JohnHerry avatar Mar 31 '22 11:03 JohnHerry