WhisperLiveKit icon indicating copy to clipboard operation
WhisperLiveKit copied to clipboard

Trying SimulStreaming results in 2 errors: not able to use the warmup file and tensor mismatch

Open Drasek opened this issue 5 months ago β€’ 8 comments

Hello, after upgrading from 0.17 to 0.22 i wanted to try out SimulStreaming and solved my first problem (see other issue) by adding a hardcoded path to the loading of the licence file. Now i get these two errors:

whisperlivekit-server --backend simulstreaming --model large-v3 --language de --warmup-file /root/data/test_weidel.wav --host 0.0.0.0 --port 8000 INFO: Started server process [14436] INFO: Waiting for application startup. ********************************************************************************πŸ“„ SimulStreaming (https://github.com/ufal/SimulStreaming) Licence

SimulStreaming is dual-licensed:

πŸ”Ή Non-Commercial Use

You may use SimulStreaming under the PolyForm Noncommercial License 1.0.0 if you obtain the code through the GitHub repository. This license is free of charge and comes with no obligations for non-commercial users.

πŸ”Έ Commercial Use

Understanding who uses SimulStreaming commercially helps us improve and prioritize development. Therefore, we want to require registration of those who acquire a commercial licence.

We plan to make the commercial licenceses affordable to SMEs and individuals. We are considering to provide commercial licenses either for free or for symbolic one-time fee, and maybe also provide additional support. You can share your preference via the questionnaire.

You can also leave your contact there to be notified when the commercial licenses become available.

βœ‰οΈ Contact

Dominik MachÑček, [email protected]******************************************************************************** 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2.88G/2.88G [07:42<00:00, 6.67MiB/s] Warming up SimulStreaming with /root/data/test_weidel.wav WARNING:whisperlivekit.whisper_streaming_custom.backends:SimulStreaming warmup failed: Cannot set attribute 'src' directly. Use '_unsafe_update_src()' and manually clear .hash of all callersinstead. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) INFO: 192.168.10.10:49528 - "GET / HTTP/1.1" 200 OK INFO: 192.168.10.10:49528 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: ('192.168.10.10', 49543) - "WebSocket /asr" [accepted] INFO:whisperlivekit.basic_server:WebSocket connection opened. INFO: connection open INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.49s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (8) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.97s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (12) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (12) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.44s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (16) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.94s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (20) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (20) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.44s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (24) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (24) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.95s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (28) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (28) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:ASR processing: internal_buffer=0.00s, lag=1.45s. ERROR:whisperlivekit.whisper_streaming_custom.online_asr:SimulStreaming processing error: The size of tensor a (32) must match the size of tensor b (4) at non-singleton dimension 3 ERROR:whisperlivekit.whisper_streaming_custom.online_asr:Error details: RuntimeError: The size of tensor a (32) must match the size of tensor b (4) at non-singleton dimension 3 INFO:whisperlivekit.audio_processor:Empty audio message received, initiating stop sequence. INFO:whisperlivekit.audio_processor:FFmpeg is stopped INFO:whisperlivekit.audio_processor:FFmpeg stdout processing finished. Signaling downstream processors. DEBUG:whisperlivekit.audio_processor:Sentinel put into transcription_queue. DEBUG:whisperlivekit.audio_processor:Transcription processor received sentinel. Finishing. INFO:whisperlivekit.audio_processor:Transcription processor task finished. WARNING:whisperlivekit.audio_processor:AudioProcessor is stopping. Ignoring incoming audio. INFO:whisperlivekit.audio_processor:Results formatter: All upstream processors are done and in stopping state. Terminating. INFO:whisperlivekit.basic_server:Results generator finished. Sending 'ready_to_stop' to client. INFO:whisperlivekit.basic_server:WebSocket disconnected by client during message receiving loop. INFO:whisperlivekit.basic_server:Cleaning up WebSocket endpoint... INFO:whisperlivekit.audio_processor:Starting cleanup of AudioProcessor resources. INFO: connection closed INFO:whisperlivekit.audio_processor:Watchdog task cancelled. INFO:whisperlivekit.audio_processor:All processing tasks cancelled or finished. INFO:whisperlivekit.audio_processor:FFmpeg manager stopped. INFO:whisperlivekit.audio_processor:AudioProcessor cleanup complete. INFO:whisperlivekit.basic_server:WebSocket endpoint cleaned up successfully.

The model is loaded (its using the proper amount of vram showing by nvtop).

Drasek avatar Jul 19 '25 10:07 Drasek

I am too experiencing this issue, with the following arguments:

--backend simulstreaming --model large-v3-turbo --frame-threshold 25 --vac --diarization --segmentation-model pyannote/segmentation-3.0 --embedding-model speechbrain/spkrec-ecapa-voxceleb

This issue does not occur with diarization disabled whilst utilising the SimulStreaming backend.

Potentially related to #e1d4bf7 or #62bf289 in whisperlivekit/whisper_streaming_custom/online_asr.py.

Audio sent via microphone using the built-in default web UI w/ the default chunk size of 1000ms.

@QuentinFuxa Any thoughts on this?

Cheers.

callumgarven avatar Jul 29 '25 14:07 callumgarven

Hi, license problem is solved, and last commit https://github.com/QuentinFuxa/WhisperLiveKit/commit/8a5e2adb1e9971c9717aba14c9a7f472efaf9f3b solves the warmup issue.

Regarding the pytorch/tensor issue, I do not manage to reproduce it, but I have improved the logs, so if you still encounter the issue could you share the logs with me? thanks

QuentinFuxa avatar Jul 31 '25 14:07 QuentinFuxa

For me, this was due to a issue with RTX 5000 series cards. Specifically whisperlivekit/simul_whisper/whisper/triton_ops.py needs updated to support latest Triton versions. See this commit @ openai/whisper whereby they adjust this to support both old and new versions - https://github.com/openai/whisper/commit/86899243e9fd1047a04a0e3991ef4b239c639d56.

For me, this occured as I had adjusted this line - https://github.com/QuentinFuxa/WhisperLiveKit/blob/9dcfb389675a1d0ed9b9f43495de64ff657dd21a/Dockerfile#L27 in the WhisperLiveKit Dockerfile to use CU128 for RTX 5000 support.

Warmup file loads fine with this change, etc.

Hope this helps. @Drasek @QuentinFuxa

Cheers.

callumgarven avatar Aug 01 '25 18:08 callumgarven

On a further note, the Dockerfile included with the project is missing build-essential from the system deps - required for the JIT compilation of Triton kernels (to my understanding?), additionally python3-dev is needed for the Python headers during compilation.

Otherwise, https://github.com/ufal/SimulStreaming/blob/4a3e7aa7b8748c08012d5be2ffe9ae9e2734ad63/simul_whisper/whisper/timing.py#L42 is shown in the console (this really needs to expose the exception error as this is misleading, this can mean either there is C deps missing, or the Python build deps are missing, etc).

Cheers.

callumgarven avatar Aug 01 '25 18:08 callumgarven

Trying more stuff atm with Version 0.23:

  • pip install whisperlivekit
  • pip install torch
  • pip install mosestokenizer wtpsplit
  • pip install diart
  • pip install whisperlivekit[simulstreaming]

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchaudio 2.7.1 requires torch==2.7.1, but you have torch 2.3.1 which is incompatible. torchvision 0.22.1 requires torch==2.7.1, but you have torch 2.3.1 which is incompatible. Successfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvtx-cu12-12.1.105 torch-2.3.1 triton-2.3.1

  • whisperlivekit-server --backend simulstreaming --model large-v3 --language de --warmup-file /root/data/test_weidel.wav --host 0.0.0.0 --port 8000

INFO: Started server process [29169] INFO: Waiting for application startup. ERROR: Traceback (most recent call last): File "/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/starlette/routing.py", line 692, in lifespan async with self.lifespan_context(app) as maybe_state: ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.12.9/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/whisperlivekit/basic_server.py", line 20, in lifespan transcription_engine = TranscriptionEngine( ^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/whisperlivekit/core.py", line 81, in init self.asr, self.tokenizer = backend_factory(self.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.12.9/lib/python3.12/site-packages/whisperlivekit/whisper_streaming_custom/whisper_online.py", line 75, in backend_factory raise SIMULSTREAMING_ERROR_AND_INSTALLATION_INSTRUCTIONS ImportError: SimulStreaming dependencies are not available. Please install WhisperLiveKit using pip install "whisperlivekit[simulstreaming]"

ERROR: Application startup failed. Exiting.

If i uninstall torch and try to install it again, i do not get the error while installing simulstreaming but the error iff i try to run wsl is the same. And to be honest it would be cool to have the torch 2.3.1 vs 2.7.1 thing solved (runing on a RTX3090). Anything i can try?

Drasek avatar Aug 06 '25 09:08 Drasek

Hey I have revamped the simulstreaming backend files and classes in the release 0.2.5, which, I hope, solves the issue, if you want to give a try

QuentinFuxa avatar Aug 13 '25 08:08 QuentinFuxa

I'm facing the same issue but slightly different. I have installed 0.2.5 and have a successful session. However, if I reconnect via the browser I crash with Size of tensor [different per audio segment] must match size of tensor b (4) at non singleton dimension 3

chris-aeviator avatar Aug 15 '25 12:08 chris-aeviator

Hi @chris-aeviator Thank you for the feedback, the issue has been solved in the last commit https://github.com/QuentinFuxa/WhisperLiveKit/commit/1652db9a2d8e96693b581cb2119a5bbe18e28f5c Will be in the next release

QuentinFuxa avatar Aug 15 '25 21:08 QuentinFuxa