icefall icon indicating copy to clipboard operation
icefall copied to clipboard

Why unique lexicon is needed in Chinese ASR, but not in English ASR?

Open yzchen563 opened this issue 9 months ago • 0 comments

To prepare phone based lang, I see generate_unique_lexicon.py is used in almost every Chinese ASR eg(e.g. aishell-*), but it's not in English ASR(e.g. gigaspeech, librispeech), what's the reason?

I want to use k2.ctc_loss to process multi-pronunciation transcription problem in Chinese ASR, just like the English corpus, in which no special process to make the lexicon unique, is that more accurate than unique_lexicon?

yzchen563 avatar May 31 '24 05:05 yzchen563