TensorFlowASR What I should do if I want to train a Japanese Model?

What I should do if I want to train a Japanese Model?

Open ymzlygw opened this issue 4 years ago • 3 comments

Hi, my question is that for english, the output of model is directly the index of char If I understand correctly，then it can map between char and sequence. And for japanese, what is the output of model? and how to create map between index and kanji of jp.

Aug 23 '21 07:08 ymzlygw

I see the english_characters , what about japanese? And too get the japanese_characters, token_type using is 'char' or 'bpe'? ENGLISH_CHARACTERS = [a-z],

Aug 24 '21 07:08 ymzlygw

@ymzlygw I think for Japanese, Korean, Chinese we should use subwords instead of characters. If you can define a vocabulary contains all characters of the language like in english then you can use character mode. As far as I know those languages have characters that are a combination of "some characters in alphabet" so I think it's quite a lot for you to define a characters vocabulary file.

Oct 10 '21 09:10 nglehuy

Hi, I tried to train a Chinese model and it seems not good, I followed the steps in Conformer the same way with English. can have a suggestion on how could I properly train a Chinese model? Thanks!

Feb 16 '22 13:02 psyma

TensorFlowASR TensorFlowASR copied to clipboard

What I should do if I want to train a Japanese Model?

TensorFlowASR
TensorFlowASR copied to clipboard