merlin icon indicating copy to clipboard operation
merlin copied to clipboard

How to add new language

Open mirfan899 opened this issue 5 years ago • 4 comments

I've all lexicon.scm, questions.hed and phoneset.scm and audio and text data. I've tried to build Chinese TTS but failed. I'm stuck at prepare_labels_from_txt.sh. There is no straight forward method yet in docs to build a TTS for a new language.

./scripts/prepare_labels_from_txt.sh database/utts.data database/labels conf/global_settings.cfg 
creating a scheme file from text file
generating utts from scheme file
converting festival utts to labels...
ASR73
ASR74
ASR75
ASR76
ASR77
ASR78
ASR79
ASR8
ASR80
ASR81
ASR82
ASR83
ASR84
ASR85
ASR86
ASR87
ASR88
ASR89
ASR9
ASR90
ASR91
ASR92
ASR93
ASR94
ASR95
ASR96
ASR97
ASR98
ASR99
...
normalizing label files for merlin...
...
ASR1.lab
ASR10.lab
ASR74.lab
ASR75.lab
ASR76.lab
ASR77.lab
ASR78.lab
ASR79.lab
ASR8.lab
ASR80.lab
ASR81.lab
ASR82.lab
ASR83.lab
ASR84.lab
ASR85.lab
ASR86.lab
ASR87.lab
ASR88.lab
ASR89.lab
ASR9.lab
ASR90.lab
ASR91.lab
ASR92.lab
ASR93.lab
ASR94.lab
ASR95.lab
ASR96.lab
ASR97.lab
ASR98.lab
ASR99.lab
Labels are ready in: database/labels/prompt-lab !!
root@instance-2:/home/virtuoso_irfan/merlin/egs/build_your_own_voice/s1# cat database/labels/prompt-lab/ASR1.lab 
root@instance-2:/home/virtuoso_irfan/merlin/egs/build_your_own_voice/s1# cat database/labels/prompt-lab/ASR2.lab

Label files are empty and uttr files have some strange text.

cat database/labels/prompt-utt/ASR1.utt 
EST_File utterance
DataType ascii
version 2
EST_Header_End
Features max_id 101 ; type Text ; iform "\"一二三四五六七八九十。十一十二十三十四十五十六十七十八十九二十\"" ; 
Stream_Items
1 id _1 ; name 一二三四五六七八九十。十一十二十三十四十五十六十七十八十九二十 ; whitespace "" ; prepunctuation "" ; 
2 id _2 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.2298 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
3 id _3 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
4 id _4 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
5 id _5 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
6 id _6 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
7 id _7 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
8 id _8 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
9 id _9 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
10 id _10 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
11 id _11 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
12 id _12 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
13 id _13 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
14 id _14 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
15 id _15 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
16 id _16 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
17 id _17 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
18 id _18 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ; 
19 id _19 ; name � ; pos nn ; pos_index 8 ; pos_index_score 0 ; phr_pos n ; phrase_score -5.15147 ; pbreak_index 1 ; pbreak_index_score 0 ; pbreak NB ;

mirfan899 avatar Jun 26 '19 06:06 mirfan899

On 26 Jun 2019, at 7:34, Muhammad Irfan wrote:

I've all lexicon.scm, questions.hed and phoneset.scm and audio and text data. I've tried to build Chinese TTS but failed. I'm stuck at prepare_labels_from_txt.sh. There is no straight forward method yet in docs to build a TTS for a new language.

Merlin is language-agnostic and relies on external tools to do front-end processing such as text normalisation and phonetisation.

If you read prepare_labels_from_txt.sh, you’ll see that it runs Festival’s front end. That point in the process is where you would add support for a new language.

In practice, this means that you need to replace Festival with a Chinese front-end, or at least a phonetiser for Chinese. Extending Festival to support Chinese is definitely NOT recommended!

Simply providing a Chinese dictionary to Festival won’t work - a front-end comprises a lot more than just a dictionary.

Regards, Simon The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

simonkingedinburgh avatar Jun 26 '19 09:06 simonkingedinburgh

Is https://github.com/CSTR-Edinburgh/Ossian is a good choice for Chinese frontend?

mirfan899 avatar Jun 26 '19 12:06 mirfan899

according to an issue on ossian ( https://github.com/CSTR-Edinburgh/Ossian/issues/4 ) you have this: https://github.com/Jackiexiao/MTTS

seblemaguer avatar Jun 27 '19 11:06 seblemaguer

I tried to use https://github.com/Jackiexiao/MTTS for Cantonese but it turned into messy due to hardcoded for Mandarin.

mirfan899 avatar Jul 04 '19 17:07 mirfan899