icefall
icefall copied to clipboard
Why unique lexicon is needed in Chinese ASR, but not in English ASR?
To prepare phone based lang, I see generate_unique_lexicon.py is used in almost every Chinese ASR eg(e.g. aishell-*), but it's not in English ASR(e.g. gigaspeech, librispeech), what's the reason?
I want to use k2.ctc_loss to process multi-pronunciation transcription problem in Chinese ASR, just like the English corpus, in which no special process to make the lexicon unique, is that more accurate than unique_lexicon?