icefall Why unique lexicon is needed in Chinese ASR, but not in English ASR?

Why unique lexicon is needed in Chinese ASR, but not in English ASR?

Open yzchen563 opened this issue 1 year ago • 0 comments

To prepare phone based lang, I see generate_unique_lexicon.py is used in almost every Chinese ASR eg(e.g. aishell-*), but it's not in English ASR(e.g. gigaspeech, librispeech), what's the reason?

I want to use k2.ctc_loss to process multi-pronunciation transcription problem in Chinese ASR, just like the English corpus, in which no special process to make the lexicon unique, is that more accurate than unique_lexicon?

May 31 '24 05:05 yzchen563

icefall icefall copied to clipboard

Why unique lexicon is needed in Chinese ASR, but not in English ASR?

icefall
icefall copied to clipboard