Japanese sounds unnatural

Open michaellin99999 opened this issue 1 year ago • 3 comments

I have combined the phoneme sets for all three langauges, English, Chinese, Japanese and started fine tuning using a datset comprised of all three speech languages The base model I use is the chinese and english base. However after 500 epochs, the result I get, chinese is good, english is good, however japanese sounds unnatural . My udnerstanding is that the phonemes are correct but the tone is just not how japanese is spoken. What can I do to improve this?

Here is a sample data of the japanese output. https://soundcloud.com/michael-lin-674069136/japanese-test

Nov 19 '24 07:11 michaellin99999

Are you using it on Docker?

Nov 19 '24 17:11 eliteexod

i have tried on docker and also onnx runtime both sound like this

Nov 19 '24 17:11 michaellin99999

Hello, may I ask which pre-trained model you used for fine-tuning? How long did you train? How is the config set up? The model I trained cannot produce complete sentences, and the speech is very strange.

Dec 06 '24 06:12 baishouwujianfei