WhisperLiveKit SimulStreaming processing error: string index out of range

wlk  | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 0.60s | last_end = 52.88399999999999 |
wlk  | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk  | Detected language: ko with p=0.4471
wlk  | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk  | ERROR:whisperlivekit.simul_whisper.backend:SimulStreaming processing error: string index out of range
wlk  | Traceback (most recent call last):
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/backend.py", line 113, in process_iter
wlk  |     timestamped_words = self.model.infer(is_last=is_last)
wlk  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
wlk  |     return func(*args, **kwargs)
wlk  |            ^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 572, in infer
wlk  |     split_words, split_tokens = self.tokenizer.split_to_word_tokens(new_hypothesis)
wlk  |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 284, in split_to_word_tokens
wlk  |     return self.split_tokens_on_spaces(tokens)
wlk  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 312, in split_tokens_on_spaces
wlk  |     subwords, subword_tokens_list = self.split_tokens_on_unicode(tokens)
wlk  |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 301, in split_tokens_on_unicode
wlk  |     or decoded_full[unicode_offset + decoded.index(replacement_char)]
wlk  |        ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  | IndexError: string index out of range
wlk  | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk  | ERROR:whisperlivekit.simul_whisper.backend:SimulStreaming processing error: The size of tensor a (23) must match the size of tensor b (10) at non-singleton dimension 3
wlk  | Traceback (most recent call last):
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/backend.py", line 113, in process_iter
wlk  |     timestamped_words = self.model.infer(is_last=is_last)
wlk  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
wlk  |     return func(*args, **kwargs)
wlk  |            ^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 474, in infer
wlk  |     logits = self.logits(tokens_for_logits, encoder_feature) # B, len(tokens), token dict size
wlk  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 249, in logits
wlk  |     logit = self.inference.logits(tokens, audio_features)
wlk  |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/beam.py", line 17, in logits
wlk  |     return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
wlk  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
wlk  |     return self._call_impl(*args, **kwargs)
wlk  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk  |   File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl

I was speaking spanish with a falsetto voice, and it decided that I was speaking korean. As soon as it did, it crashed with the above stack trace.

This is the closest issue I've found to this one https://github.com/QuentinFuxa/WhisperLiveKit/issues/152 , but I think it's unrelated.

Oct 02 '25 13:10 Damrod

Hi, that is probably unrelated, I will look at it

Oct 06 '25 17:10 QuentinFuxa

Any progress? I encountered the same error when using whisper-tiny.

Thank you

Nov 25 '25 08:11 cnlinxi