WhisperLiveKit
WhisperLiveKit copied to clipboard
SimulStreaming processing error: string index out of range
wlk | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s | + Silence of = 0.60s | last_end = 52.88399999999999 |
wlk | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk | Detected language: ko with p=0.4471
wlk | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk | ERROR:whisperlivekit.simul_whisper.backend:SimulStreaming processing error: string index out of range
wlk | Traceback (most recent call last):
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/backend.py", line 113, in process_iter
wlk | timestamped_words = self.model.infer(is_last=is_last)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
wlk | return func(*args, **kwargs)
wlk | ^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 572, in infer
wlk | split_words, split_tokens = self.tokenizer.split_to_word_tokens(new_hypothesis)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 284, in split_to_word_tokens
wlk | return self.split_tokens_on_spaces(tokens)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 312, in split_tokens_on_spaces
wlk | subwords, subword_tokens_list = self.split_tokens_on_unicode(tokens)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/whisper/tokenizer.py", line 301, in split_tokens_on_unicode
wlk | or decoded_full[unicode_offset + decoded.index(replacement_char)]
wlk | ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | IndexError: string index out of range
wlk | INFO:whisperlivekit.audio_processor:internal_buffer=0.00s | lag=0.00s |
wlk | ERROR:whisperlivekit.simul_whisper.backend:SimulStreaming processing error: The size of tensor a (23) must match the size of tensor b (10) at non-singleton dimension 3
wlk | Traceback (most recent call last):
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/backend.py", line 113, in process_iter
wlk | timestamped_words = self.model.infer(is_last=is_last)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
wlk | return func(*args, **kwargs)
wlk | ^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 474, in infer
wlk | logits = self.logits(tokens_for_logits, encoder_feature) # B, len(tokens), token dict size
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/simul_whisper.py", line 249, in logits
wlk | logit = self.inference.logits(tokens, audio_features)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/whisperlivekit/simul_whisper/beam.py", line 17, in logits
wlk | return self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
wlk | return self._call_impl(*args, **kwargs)
wlk | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
wlk | File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
I was speaking spanish with a falsetto voice, and it decided that I was speaking korean. As soon as it did, it crashed with the above stack trace.
This is the closest issue I've found to this one https://github.com/QuentinFuxa/WhisperLiveKit/issues/152 , but I think it's unrelated.
Hi, that is probably unrelated, I will look at it
Any progress? I encountered the same error when using whisper-tiny.
Thank you