nnmnkwii Document how to build speech synthesis system for new languages

All you need is that

Wav files
Full-context labels
HTS-style question file

With those all prepared, it should be very straightforward to implement.

Aug 17 '17 17:08 r9y9

Hi thank you for this work. About this issue: Is the above recipe complete in the case of tonal languages? esp. those with rising and falling tones/pitch on vowels, nasal consonants?

Dec 28 '17 19:12 ruohoruotsi

Yes. I think you will need to annotate the accent information (rising/failing tones) as the HTS-style label.

Dec 29 '17 08:12 r9y9

Hi, 山本さん。I'm trying to synthesize mandarin using your tool. To my knowledge, i need to do forced alignment manually in previous. And then writing a frontend that adapts the language to extract linguistic features. So does that mean i only need to replace the frontend part? And could i using other forced alignment tools such as "montreal" at other alignment level which is neither 'state' nor 'phone', for example, 'syllable'？

Jan 03 '18 07:01 attitudechunfeng

こんにちは、 @attitudechunfeng !

So does that mean i only need to replace the frontend part?

Yes, you can reuse other parts. You can also reuse a part of frontend (https://r9y9.github.io/nnmnkwii/latest/references/frontend.html#frontend) to convert your linguistic features to its numeric representation at either phone, state or frame-level if you use the HTS-style label format.

And could i using other forced alignment tools such as "montreal" at other alignment level which is neither 'state' nor 'phone', for example, 'syllable'？

You could, but then you cannot reuse https://r9y9.github.io/nnmnkwii/latest/references/frontend.html#frontend, since it assumes state or phone-level alignment.

Jan 03 '18 08:01 r9y9

本当にありがとう！I'll try it.

Jan 03 '18 08:01 attitudechunfeng

Alternatively, you could consider end-to-end approach, which doesn't require alignment as well as linguistic feature extraction (the hard part of the TTS!). See https://github.com/r9y9/deepvoice3_pytorch if you are interested.

Jan 03 '18 09:01 r9y9

Thank u. In fact, i'm also following your other excellent tts projects. However, i'm now trying do some work about offline usage, end-to-end models are not convenient to be transferred to mobile devices and its speed on cpu is also can't be guaranteed. So i have to use traditional method.

Jan 03 '18 09:01 attitudechunfeng

I see. I hope you find something useful. Let me know if you find something should be improved.

Jan 03 '18 09:01 r9y9

okay, if there're something interesting, i'll report it.

Jan 03 '18 09:01 attitudechunfeng

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

May 30 '19 01:05 stale[bot]

All you need is that

Wav files

Full-context labels

HTS-style question file

With those all prepared, it should be very straightforward to implement.

I have wav files of punjabi language. Please guide me to generate full context labels and HTS-style question file

Apr 10 '21 09:04 HarmanGhawaddi

nnmnkwii nnmnkwii copied to clipboard

Document how to build speech synthesis system for new languages

nnmnkwii
nnmnkwii copied to clipboard