sherpa-onnx
sherpa-onnx copied to clipboard
How to create LM for nemo offlne model
How to create LM to use for Nemo offline model in nodejs or python version and how to create those HCG graphs for nemo model.
There is nothing special for NeMo models.
All models trained by the CTC loss, including but not limited to those from NeMo and icefall and other frameworks, can follow what we have done in icefall to build HLG.fst
.
Please see https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/prepare_lm.sh#L97
if [ ! -f $lang_dir/HL.fst ]; then
./local/prepare_lang_fst.py \
--lang-dir $lang_dir \
--ngram-G ./data/lm/G_3_gram.fst.txt
fi
You need to prepare
- tokens.txt
- lexicon.txt
- words.txt
- an n-gram arpa file
tokens.txt
is already contained in the NeMo model, I think. You can reuse words.txt
and the 3-gram arpa file from librispeech if you want.