text-generation-webui whisper

Describe the bug

After enabling both silero_tts and whisper_stt extensions in the "Interface mode" tab, applying and restarting the interface, whisper_stt results in an "Error" message when trying to use the micrphone to record a prompt. No user input displays and right away a random voice response from the assitant is recieved.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Enable both silero_tts and whisper_stt.
Record a prompt.

Screenshot

2023-04-13 12_21_06-Text generation web UI

Logs

Starting the web UI...
Warning: --cai-chat is deprecated. Use --chat instead.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: D:\Tools\oobabooga-windows\installer_files\env\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll...
Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
Found the following quantized model: models\anon8231489123_vicuna-13b-GPTQ-4bit-128g\vicuna-13b-4bit-128g.safetensors
Loading model ...
D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
Done.
Loaded the model in 4.10 seconds.
Loading the extension "gallery"... Ok.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Closing server running on port: 7860
Loading the extension "gallery"... Ok.
Loading the extension "silero_tts"...
Using Silero TTS cached checkpoint found at C:\Users\anahum/.cache\torch\hub
Ok.
Loading the extension "whisper_stt"... Ok.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
  warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning)
Traceback (most recent call last):
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\processing_utils.py", line 138, in audio_from_file
    audio = AudioSegment.from_file(filename)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\pydub\audio_segment.py", line 728, in from_file
    info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\pydub\utils.py", line 274, in mediainfo_json
    res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1106, in process_api
    inputs = self.preprocess_data(fn_index, inputs, state)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 995, in preprocess_data
    processed_input.append(block.preprocess(inputs[i]))
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\components.py", line 2306, in preprocess
    sample_rate, data = processing_utils.audio_from_file(
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\processing_utils.py", line 148, in audio_from_file
    raise RuntimeError(msg) from e
RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
Output generated in 8.13 seconds (7.13 tokens/s, 58 tokens, context 69, seed 1632075903)
Traceback (most recent call last):
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\processing_utils.py", line 138, in audio_from_file
    audio = AudioSegment.from_file(filename)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\pydub\audio_segment.py", line 728, in from_file
    info = mediainfo_json(orig_file, read_ahead_limit=read_ahead_limit)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\pydub\utils.py", line 274, in mediainfo_json
    res = Popen(command, stdin=stdin_parameter, stdout=PIPE, stderr=PIPE)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\subprocess.py", line 1440, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\routes.py", line 393, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1106, in process_api
    inputs = self.preprocess_data(fn_index, inputs, state)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 995, in preprocess_data
    processed_input.append(block.preprocess(inputs[i]))
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\components.py", line 2306, in preprocess
    sample_rate, data = processing_utils.audio_from_file(
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\site-packages\gradio\processing_utils.py", line 148, in audio_from_file
    raise RuntimeError(msg) from e
RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "D:\Tools\oobabooga-windows\installer_files\env\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Output generated in 3.27 seconds (7.96 tokens/s, 26 tokens, context 128, seed 1574481083)

System Info

NVIDIA RTX 3090

Apr 13 '23 09:04 assafna

I have the same issue with this extension.

Apr 13 '23 17:04 KirillRepinArt

I'm also running into this (Windows 11, RTX 4090, one-click installer). I've tried a few things, like installing ffmpeg at the system level and adding that to PATH, as well as adding a bin folder to the Oobabooga directory and adding that to PATH, but still get the error with Whisper (regardless if Solero is activated or not). Funnily enough I'm seeing the exact same error in a Stable Diffusion Automatic1111 extension (SadTalker) due to the same non WAV audio ffmpeg dependency. I wonder if it's a Gradio thing, as I saw the same error appear in this issue https://github.com/gradio-app/gradio/issues/3429

Apr 15 '23 14:04 michaelpick

same issue here guys! :(

Apr 23 '23 19:04 Leggyweggy

The same issue. How can we make it work?

Apr 27 '23 12:04 Kvento

Ok, I found a solution: https://phoenixnap.com/kb/ffmpeg-windows I did this and it works for me.

Apr 27 '23 16:04 Kvento

I had to restart my computer after installing the ffmpeg as above, otherwise it worked well..

May 01 '23 18:05 ized3d

could you guys tell me where you cloned the FFmpeg repository to if it matters? also do i have to set it as a system variable at all? i'm going to mess around for the time being and find out

May 01 '23 19:05 Leggyweggy

Ok, I found a solution: https://phoenixnap.com/kb/ffmpeg-windows I did this and it works for me.

I did that and the previous error disappeared but now i get another when i try to record with whisper.

Traceback (most recent call last):
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\gradio\routes.py", line 427, in run_predict
    output = await app.get_blocks().process_api(
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1323, in process_api
    result = await self.call_function(
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1051, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "H:\oobabooga_windows\text-generation-webui\extensions\whisper_stt\script.py", line 48, in auto_transcribe
    transcription = do_stt(audio, whipser_model, whipser_language)
  File "H:\oobabooga_windows\text-generation-webui\extensions\whisper_stt\script.py", line 36, in do_stt
    transcription = r.recognize_whisper(audio_data, language=whipser_language, model=whipser_model)
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\speech_recognition\__init__.py", line 1479, in recognize_whisper
    import whisper
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\whisper\__init__.py", line 13, in <module>
    from .model import ModelDimensions, Whisper
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\whisper\model.py", line 13, in <module>
    from .transcribe import transcribe as transcribe_function
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\whisper\transcribe.py", line 20, in <module>
    from .timing import add_word_timestamps
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\whisper\timing.py", line 7, in <module>
    import numba
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\numba\__init__.py", line 55, in <module>
    _ensure_critical_deps()
  File "H:\oobabooga_windows\installer_files\env\lib\site-packages\numba\__init__.py", line 42, in _ensure_critical_deps    raise ImportError("Numba needs NumPy 1.24 or less")
ImportError: Numba needs NumPy 1.24 or less

Aug 09 '23 16:08 Zombie-Dude

I asked "TheBloke_Octocoder-GPTQ" model for help on this error above zombie-dude posted, it said miniconda doesn't have all the required files... I ran cmd_windows.bat from the oogabooga folder then pasted suggestion from "TheBloke_Octocoder-GPTQ" model conda install -c conda-forge librosa works for me, (i haven't run update_windows.bat to see if it breaks again)

Aug 24 '23 01:08 Muchaelsandall

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Oct 07 '23 23:10 github-actions[bot]

This is still an issue for some. This post provides the only viable solution to this issue as the oobabooga install instructions do not provide this insight and some the ffmpeg install methods seem to have issues.

This post provided the link to the correct instructions.

Dec 13 '23 16:12 newbe65

conda install ffmpeg solved this issue for me on Windows.

Jan 04 '24 23:01 noobmaster29

text-generation-webui
text-generation-webui copied to clipboard

whisper_stt "Error"

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui text-generation-webui copied to clipboard

whisper_stt "Error"

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard