sherpa-onnx Fix whisper

Fixes #633

@szaszakgy

Could you use this PR to test the wave failing to decode?

Please use first the test.py from this PR. You need to re-export the model using the latest export-onnx.py from this PR.

I will fix the C++ code tomorrow.

CC @GaryLaurenceauAva

Jun 20 '24 13:06 csukuangfj

Hi @csukuangfj , thanks for the feedback! I am able to run test.py on the original recording. It returns a transcription result, which is however chopped, the last 9 words are missing compared to testing on the shared file problem_01.wav, which returns a perfect result. I tested 3 more problem recordings. 2 of them return a transcript, which is chopped compared to original whisper. One of them still returns with the previous failure 'INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'/Expand' Status Message: invalid expand shape' . This recording contains repetitions (self corrections or stuttering), but can be decoded with original whisper without issues.

Jun 20 '24 16:06 szaszakgy

could you share the problematic wav and tell us which model you are using?

Jun 20 '24 23:06 csukuangfj

Hi, I was using the base.en model (exported it first as you requested and run it through test.py), I am attaching the audio for which I still have onnxruntime error. For the others as said, the transcripts returned are trimmed at the point where I had earlier observed the stuck by repeating the same token(s) with previous version.

Fangjun Kuang @.***> ezt írta (időpont: 2024. jún. 21., P, 1:38):

could you share the problematic wav and tell us which model you are using?

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/sherpa-onnx/pull/1037#issuecomment-2181723515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFIM2QEC2GYTK4OSFC6JHLTZINRWJAVCNFSM6AAAAABJUAVT6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRG4ZDGNJRGU . You are receiving this because you were mentioned.Message ID: @.***>

Jun 21 '24 09:06 szaszakgy

Tested with whisper models with DirectML / CPU on Windows with the newly exported models.

tiny.int8 CPU (success)

python .\scripts\whisper\export-onnx.py --model tiny
python .\scripts\whisper\test.py --encoder .\tiny-encoder.int8.onnx --decoder .\tiny-decoder.int8.onnx --tokens tiny-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
2024-08-09 18:05:44.3137218 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:05:44.3223491 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:05:44.9778383 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:05:44.9866576 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

tiny.int8 DML (success)

python .\scripts\whisper\test.py --encoder .\tiny-encoder.int8.onnx --decoder .\tiny-decoder.int8.onnx --tokens tiny-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
2024-08-09 18:24:38.9712836 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:24:38.9799328 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:24:39.4824920 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:24:39.4912264 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

medium.int8 CPU (success)

python .\scripts\whisper\export-onnx.py --model medium
 python .\scripts\whisper\test.py --encoder .\medium-encoder.int8.onnx --decoder .\medium-decoder.int8.onnx --tokens .\medium-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav
After early nightfall the yellow lamps would light up here and there the squalid quarter of the brothels.

medium.int8 DML (failed)

python .\scripts\whisper\test.py --encoder .\medium-encoder.int8.onnx --decoder .\medium-decoder.int8.onnx --tokens .\medium-tokens.txt --language en --task transcribe sherpa-onnx-whisper-medium\test_wavs\0.wav       
2024-08-09 18:22:35.7186952 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:22:35.7283108 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:22:40.3298896 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-08-09 18:22:40.3379720 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-08-09 18:22:45.4154322 [E:onnxruntime:, sequential_executor.cc:516 onnxruntime::ExecuteKernel] Non-zero status code returned while running MemcpyToHost node. Name:'Memcpy_token_172' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FF9A58D300E: (caller: 00007FF9A601D211) Exception(3) tid(2f14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Traceback (most recent call last):
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 415, in <module>
    main()
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 370, in main
    logits, n_layer_self_k_cache, n_layer_self_v_cache = model.run_decoder(
                                                         ^^^^^^^^^^^^^^^^^^
  File "D:\sherpa\sherpa-onnx\scripts\whisper\test.py", line 154, in run_decoder
    logits, out_n_layer_self_k_cache, out_n_layer_self_v_cache = self.decoder.run(
                                                                 ^^^^^^^^^^^^^^^^^
  File "C:\Users\User\.rye\py\[email protected]\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running MemcpyToHost node. Name:'Memcpy_token_172' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2557)\onnxruntime_pybind11_state.pyd!00007FF9A58D300E: (caller: 00007FF9A601D211) Exception(3) tid(2f14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Aug 09 '24 15:08 thewh1teagle

sherpa-onnx sherpa-onnx copied to clipboard

Fix whisper

sherpa-onnx
sherpa-onnx copied to clipboard