sherpa
sherpa copied to clipboard
decoded text not similar
I have built models based on conformer-ctc librispeech. I am comparing the decoded text of the test set using sherpa offline_ctc_asr. The decoded text are not exactly similar for the same file. I want to get the exact same decoded text, please help.
Do you use the same decoding method? Does this happen for all files (i.e. the WERs of a bunch of files are worse) or just for one wav?
The current model is trained for 0-19 epoch.
The test files are decoded using "./conformer_ctc/decode.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp".
The 19th epoch model is exported with "python conformer_ctc/export.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp --lang-dir data/lang_bpe_500 --jit 1" to be used with sherpa
The exported model is used with "./sherpa/bin/offline_ctc_asr.py --nn-model conformer_ctc/exp/cpu_jit.pt --tokens data/lang_bpe_500/tokens.txt --use-gpu false --HLG data/lang_bpe_500/HLG.pt --lm-scale 5.0 audio_files/1000000194.wav", I have checked with different values of --lm-scale on few different files but the decoded text given by decode.py and offline_ctc_asr.py are not same.
Could you post the decoding logs of ./conformer_ctc/decode.py --epoch 19 --avg 1 --exp-dir conformer_ctc/exp
and ./sherpa/bin/offline_ctc_asr.py --nn-model conformer_ctc/exp/cpu_jit.pt --tokens data/lang_bpe_500/tokens.txt --use-gpu false --HLG data/lang_bpe_500/HLG.pt --lm-scale 5.0 audio_files/1000000194.wav
so we can compare the decoding configuration.