sherpa-onnx Availability of different beam search as icefall

Hi,

In icefall, there are multiple decoding methods available, eg. greedy_search, beam_search, modified_beam_search, fast_beam_search, fast_beam_search_nbest. There are some other decoding methods for LM as well (modified_beam_search_lm_shallow_fusion, modified_beam_search_LODR, modified_beam_search_lm_rescore, modified_beam_search_lm_rescore_LODR). But in sherpa onnx, there are only two valid decoded methods (greedy_search and modified_beam_search) Can we use the other decoding methods same as icefall in sherpa onnx as well ?

Dec 05 '23 07:12 bhaswa

I am afraid you cannot. We have implemented only greedy_search and modified_beam_search for transducer models.

fast_beam_search requires k2 but sherpa-onnx does not depend on k2.

Dec 05 '23 07:12 csukuangfj

In case LM is used, LODR, Rescoring or shallow fusion also cannot be used in sherpa onnx ?

Dec 05 '23 07:12 bhaswa

No, you can use RNN lm rescoring with sherpa-onnx.

Please search for the PR for rnnlm rescoring in sherpa-onnx. There are usages in the comments of that PR.

Dec 05 '23 07:12 csukuangfj

So by default, if I use --lm and --decoding-method=modified_beam_search, it will be lm rescoring ?

Dec 05 '23 08:12 bhaswa

You need to pass the rnnlm model

Dec 05 '23 08:12 csukuangfj

Yes. rnnlm need to be provided

Dec 05 '23 08:12 bhaswa

https://github.com/k2-fsa/sherpa-onnx/pull/353

From the above pull request, it seems that shallow fusion is also implemented. Can you provide the usage for the same ?

Dec 05 '23 08:12 bhaswa

https://github.com/k2-fsa/sherpa-onnx/pull/147

Please search for shallow fusion in the related PR. You can find usages in the comments.

Dec 05 '23 09:12 csukuangfj

https://github.com/k2-fsa/sherpa-onnx/pull/125 From the above PR I found the usage for LM rescore as below:

./build/bin/sherpa-onnx-offline
--tokens=./sherpa-onnx-zipformer-en-2023-04-01/tokens.txt
--encoder=./sherpa-onnx-zipformer-en-2023-04-01/encoder-epoch-99-avg-1.onnx
--decoder=./sherpa-onnx-zipformer-en-2023-04-01/decoder-epoch-99-avg-1.onnx
--joiner=./sherpa-onnx-zipformer-en-2023-04-01/joiner-epoch-99-avg-1.onnx
--lm-scale=0.5
--num-threads=2
--decoding-method=modified_beam_search
--max-active-paths=4
./2414-159411-0024.wav \

https://github.com/k2-fsa/sherpa-onnx/pull/147 From this PR I found the usage of shallow fusion as below: ./bin/sherpa-onnx exp/data/lang_char_bpe/tokens.txt exp/exp/encoder-epoch-99-avg-1.onnx exp/exp/decoder-epoch-99-avg-1.onnx exp/exp/joiner-epoch-99-avg-1.onnx exp/test_wavs/BAC009S0764W0164.wav 2 modified_beam_search exp/exp/with-state-epoch-999-avg-1.onnx

From the above two commands, I found difference only in the executable. I could not find any difference in the arguments passed to differentiate between rescoring or shallow fusion.

If I want to run the python API, how can I differentiate between rescoring and shallow fusion ?

Dec 05 '23 09:12 bhaswa

From the above PR I found the usage for LM rescore as below:

Please take a look at the usage in the PR comment. You have found the wrong place in the PR.

Dec 05 '23 10:12 csukuangfj

Screenshot 2023-12-05 at 18 45 12

Dec 05 '23 10:12 csukuangfj

My bad. I copied the wrong segment.

But still I cannot find any difference in the arguments from https://github.com/k2-fsa/sherpa-onnx/pull/125 (LM rescore) and https://github.com/k2-fsa/sherpa-onnx/pull/147 (shallow fusion)

I want to run the python API. How can I differentiate between rescoring and shallow fusion ?

Dec 05 '23 10:12 bhaswa

between rescoring and shallow fusion

Could you explain the difference between rescoring and shallow fusion?

Dec 06 '23 01:12 csukuangfj

In Icefall, we can use LM with rescoring and shallow fusion.

The command for shallow fusion is ./pruned_transducer_stateless7_streaming/decode.py
--epoch 99
--avg 1
--use-averaged-model False
--beam-size 4
--exp-dir $exp_dir
--max-duration 600
--decode-chunk-len 32
--decoding-method modified_beam_search_lm_shallow_fusion
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
--use-shallow-fusion 1
--lm-type rnn
--lm-exp-dir $lm_dir
--lm-epoch 99
--lm-scale $lm_scale
--lm-avg 1
--rnn-lm-embedding-dim 2048
--rnn-lm-hidden-dim 2048
--rnn-lm-num-layers 3
--lm-vocab-size 500

The command for rescoring is: ./pruned_transducer_stateless7_streaming/decode.py
--epoch 99
--avg 1
--use-averaged-model False
--beam-size 4
--exp-dir $exp_dir
--max-duration 600
--decode-chunk-len 32
--decoding-method modified_beam_search_lm_rescore
--bpe-model ./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model
--use-shallow-fusion 0
--lm-type rnn
--lm-exp-dir $lm_dir
--lm-epoch 99
--lm-scale $lm_scale
--lm-avg 1
--rnn-lm-embedding-dim 2048
--rnn-lm-hidden-dim 2048
--rnn-lm-num-layers 3
--lm-vocab-size 500

In sherpa onnx, how can I use LM with these two different settings? Also with the given command in sherpa onnx pull requests (https://github.com/k2-fsa/sherpa-onnx/pull/125 and https://github.com/k2-fsa/sherpa-onnx/pull/147), LM will run with rescoring or shallow fusion?

Dec 06 '23 05:12 bhaswa

sherpa-onnx sherpa-onnx copied to clipboard

Availability of different beam search as icefall

sherpa-onnx
sherpa-onnx copied to clipboard