icefall icon indicating copy to clipboard operation
icefall copied to clipboard

How to get the backoff-id from LG graph ?

Open v-tuenv opened this issue 2 years ago • 3 comments

I want to use LM-GRAM fromhttps://github.com/k2-fsa/icefall/blob/b293db4baf1606cfe95066cf28ffde56173a7ddb/icefall/ngram_lm.py#L27

But don't know how to get backoff-id when building LG graph.

L: Subword.

G: n-Gram Model.

Build from https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/prepare_lang_bpe.py & https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compile_lg.py

v-tuenv avatar Dec 14 '22 03:12 v-tuenv

I assume you are rerferring to the G graph, right?

The backoff ID is the ID of #0.

If your G is a token-level G, then #0 should come from tokens.txt. If your G is a word-level G, then #0 is from words.txt.

csukuangfj avatar Dec 14 '22 03:12 csukuangfj

@csukuangfj Thank you for supper quick reply. I used the LG gram with L build from subword and G is n-gram model. I follow with 3 step:

  1. Get N-gram Graph with generate-lm.sh.
  2. Get L graph with file https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/prepare_lang_bpe.py
  3. Compile L and G So in L I have a #0 is 1024 and in G I have a #0 is 5741. I confused what the right value for backoff-id for NGram(LG-graph). Thanks

v-tuenv avatar Dec 14 '22 03:12 v-tuenv

By the way, we only use the argument --backoff-id for G used in shallow fusion. What is the use of backoff ID for an LG graph?

csukuangfj avatar Dec 14 '22 04:12 csukuangfj