icefall Performance discrepancy in WER between decoding parallelly and decoding with single thread for non-streaming zipformer model

I have exported a non-streaming Zipformer model to ONNX format and I have used the offline-websocket-client-decode-files-paralell.py and offline_decode.py scripts to decode audio files. My objective is to decode files parallelly and get accurate transcriptions. The WER after decoding files parallelly using offline-websocket-client-decode-files-paralell.py is 46% compared to 21% when decoding with single thread using offline_decode.py. I am unable to understand why there is such a large performance difference between two scripts. I would like to get the same accurate results when I decode files parallelly. Can someone help me resolve this issue

Thanks in Advance.

Jul 26 '23 16:07 saurabhbarge

Are you able to reproduce it with our pet-trained models?

Jul 27 '23 00:07 csukuangfj

Yes, I decoded this audio file https://www.voiptroubleshooter.com/open_speech/american.html, "OSR_us_000_0011_8k.wav" using the zipformer-small pretrained model https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-small-2023-05-16

Following are the commands I used for decoding:

starting pretrained model server:

./bin/sherpa-onnx-offline-websocket-server \
  --port=8894 \
  --num-work-threads=5 \
  --tokens=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/tokens.txt \
  --encoder=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/encoder-epoch-99-avg-1.onnx  \
  --decoder=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/decoder-epoch-99-avg-1.onnx \
  --joiner=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/joiner-epoch-99-avg-1.onnx \
  --decoding-method=modified_beam_search \
  --log-file=/home/icefall/log.txt \
  --max-batch-size=5

decoding files parallelly:

python3 /home/icefall/sherpa-onnx/python-api-examples/offline-websocket-client-decode-files-paralell.py \
      --server-addr localhost \
      --server-port 8894 \
     audio/OSR_us_000_0011_8k.wav
    ...

decoding with with offline_decode.py:

python3 \
  /home/icefall/sherpa-onnx/python-api-examples/offline-decode-files.py \
  --tokens=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/tokens.txt \
  --encoder=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/encoder-epoch-99-avg-1.onnx  \
  --decoder=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/decoder-epoch-99-avg-1.onnx \
  --joiner=/home/icefall/sherpa-onnx/sherpa-onnx-zipformer-small-en-2023-06-26/joiner-epoch-99-avg-1.onnx \
  --decoding-method=modified_beam_search \
  --num-threads=1 \
  --debug=false \
  --sample-rate=8000 \
  --feature-dim=80 \
  audio/OSR_us_000_0011_8k.wav

Following are decoding results:

offline_decode.py result: THE BOY WAS THERE WHEN THE SUN ROSE A ROTTED YOUTH TO CATCH PAINT FAMINE THE SOURCE OF THE HUGE RIVER IS THE CLEAR SPRING TAKE THE BOFF SHAPE AND FOLLOW THROUGH HELP THE WOMAN GET BACK AT HER FEET A POT OF KEY HELPS HALF EVENING THE SMOKY FIGHT IS BLACK FRAME AND HEAT THE SOFT COUSIN BROOK THE MAN'S FALL THE THOUGHT BRIEF TEAM ACROSS THE FEET THE GIRL AT THE BOOT FILL FIFTY

offline-websocket-client-decode-files-paralell.py: THE BOY WAS THERE WHEN THE SUN ROSE A ROD IS USED TO CATCH PINK SALMON THE SOURCE OF THE HUGE RIVER IS THE CLEAR SPRING KICK THE BALL STRAIGHT AND FOLLOW THROUGH HELP THE WOMAN GET BACK TO HER FEET A POT OF TEA HELPS TO PASS THE EVENING SMOKY FIRES LACKS FLAME AND HEAT THE SOFT CUSHION BROKE THE MAN'S FALL THE SALT BREEZE CAME ACROSS THE SEA THE GIRL AT THE BOOTH SOLD FIFTY BONDS

As you can see the results are not same.I also observed that when using offline_decode.py I am able to pass bpe-model in the arguments, but when I decode parallelly with offline-websocket-client-decode-files-paralell.py, I cannot pass bpe-model, it shows invalid option error. Why can't we pass the bpe-model as an argument when decoding parallelly with a websocket server?

Thanks in advance

Jul 28 '23 10:07 saurabhbarge

did anybody had a chance to look into it?

Aug 08 '23 11:08 saurabhbarge

did anybody had a chance to look into it?

sorry, missed it. I am taking care of it.

Aug 08 '23 11:08 csukuangfj

As you can see the results are not same.I also observed that when using offline_decode.py I am able to pass bpe-model in the arguments, but when I decode parallelly with offline-websocket-client-decode-files-paralell.py, I cannot pass bpe-model, it shows invalid option error. Why can't we pass the bpe-model as an argument when decoding parallelly with a websocket server?

The bpe model is for the server to do contextual biasing.

For offline-websocket-client-decode-files-paralell.py, it is a client and it accepts only the server IP, server port and wave files. It does not accept other options.

Aug 08 '23 11:08 csukuangfj

okay, I think that since we are not able to pass bpe-model when starting the offline-websocket server, the results are worse with offline-websocket-client-decode-files-paralell.py compared with offline-decode-files.py where we are able to pass the bpe-model. is it correct or there is some other reason

Aug 08 '23 12:08 saurabhbarge

icefall icefall copied to clipboard

Performance discrepancy in WER between decoding parallelly and decoding with single thread for non-streaming zipformer model

icefall
icefall copied to clipboard