icefall
icefall copied to clipboard
In wenetspeech recipe, fast_beam_search_LG almost always get worse WER result than greedy search!
Collecting environment information... k2 version: 1.24.3 Build type: Release Git SHA1: 42e92fdd4097adcfe9937b4d2df7736d227b8e85 Git date: Wed Jun 28 09:50:36 2023 Cuda used to build k2: 11.6 cuDNN used to build k2: 8.2.0 Python version used to build k2: 3.9 OS used to build k2: Ubuntu 20.04.6 LTS CMake version: 3.26.4 GCC version: 7.5.0 PyTorch version used to build k2: 1.13.1+cu116 PyTorch is using Cuda: 11.6 NVTX enabled: True With CUDA: True Disable debug: True Sync kernels : False Disable checks: False Max cpu memory allocate: 214748364800 bytes (or 200.0 GB) k2 abort: False
Resource: https://huggingface.co/pkufool/icefall-asr-zipformer-streaming-wenetspeech-20230615 Testset: wenetspeech/ DEV
Bash command:
exp_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/exp
lang_dir=download/huggingface/icefall-asr-zipformer-streaming-wenetspeech-20230615/data/lang_char
decode_method=greedy_search #decode_method=fast_beam_search_LG
./zipformer/decode.py
--epoch ${ep}
--avg ${avg}
--exp-dir ${exp_dir}/
--lang-dir ${lang_dir}
--max-duration 800
--decoding-method ${decode_method}
--blank-penalty ${blank_penalty}
--ngram-lm-scale ${nls}
--ilme-scale ${ilme_scale}
--manifest-dir data/fbank/
--causal 1
--chunk-size ${chunk_size}
--left-context-frames ${left_context}
Result:
In both chunk=16 and chunk=32, I can't get better WER by fast_beam_search_LG.
Have you tried the LODR method? Also, assuming your LG is based on Chinese words, what is the vocabulary coverage of your dev set like?
In my experiments, I have always found the "nbest" variations to be better than the one best versions, e.g., fast_beam_search_nbest_LG better than fast_beam_search_LG.
Usually, you would also need to play around with the --beam parameter to balance out insertions vs. deletions. It looks like you have significantly higher deletions at the moment, maybe you can try increasing the beam.