Whisper-WebUI icon indicating copy to clipboard operation
Whisper-WebUI copied to clipboard

After Mic records a piece of audio, after the first generation is completed, the second click to generate will fail, prompting that the file does not exist.

Open 1247862674 opened this issue 1 year ago • 5 comments

Source files under /tmp/gradio are deleted after the first generation

1247862674 avatar Oct 09 '24 15:10 1247862674

To create a public link, set share=True in launch(). Traceback (most recent call last): File "/data/miniconda3/pkgs/whisper-webui/lib/python3.10/site-packages/gradio/processing_utils.py", line 536, in audio_from_file audio = AudioSegment.from_file(filename) File "/data/miniconda3/pkgs/whisper-webui/lib/python3.10/site-packages/pydub/audio_segment.py", line 651, in from_file file, close_file = _fd_or_path_or_tempfile(file, 'rb', tempfile=False) File "/data/miniconda3/pkgs/whisper-webui/lib/python3.10/site-packages/pydub/utils.py", line 65, in _fd_or_path_or_tempfile fd = open(fd, mode=mode) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/gradio/8b96c03f439af9d876c6206974c5ca140f7637f654b57f3f45074fabaed89087/audio.wav'

1247862674 avatar Oct 09 '24 15:10 1247862674

Hi. I just tried to reproduce this in the latest version of the WebUI and failed.

Currently automatically cached gradio files are cleaned up by this line :

https://github.com/jhj0517/Whisper-WebUI/blob/bc6b2e9bde036d5ed53f6697aaa9ef12d7348f5e/app.py#L24

It will remove the cached temp files if they exist for more than an hour, checking every minute. So if you tried to re-generate within an hour, it should re-generate without a problem.

If you're using an older version of the WebUI, I recommend updating it.

jhj0517 avatar Oct 11 '24 13:10 jhj0517

Hi! I ran into the same issue.

I think I might have figured out part of the "magic" behind it. I uploaded a file and, without reloading the page in browser, I kept testing different models and parameters — just experimenting to see what works best. At some point, I got the error.

Traceback (most recent call last):
  File "/Whisper-WebUI/modules/diarize/audio_loader.py", line 77, in load_audio
    out = subprocess.run(cmd, capture_output=True, check=True).stdout
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', '/tmp/gradio/520c9fc797bf1d641565979d9f70954d7e4e0249f00cc100df7636b035d9b8c4/App Recording 20250408 1404.mp3', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 273, in transcribe_file
    transcribed_segments, time_for_task = self.run(
                                          ^^^^^^^^^
  File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 193, in run
    result, elapsed_time_diarization = self.diarizer.run(
                                       ^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/modules/diarize/diarizer.py", line 65, in run
    audio = load_audio(audio)
            ^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/modules/diarize/audio_loader.py", line 79, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 5.1.6-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
/tmp/gradio/520c9fc797bf1d641565979d9f70954d7e4e0249f00cc100df7636b035d9b8c4/App Recording 20250408 1404.mp3: No such file or directory


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 2137, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1663, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2470, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/venv/lib/python3.11/site-packages/gradio/utils.py", line 890, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/Whisper-WebUI/modules/whisper/base_transcription_pipeline.py", line 318, in transcribe_file
    raise RuntimeError(f"Error transcribing file: {e}") from e
RuntimeError: Error transcribing file: Failed to load audio: ffmpeg version 5.1.6-0+deb12u1 Copyright (c) 2000-2024 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
/tmp/gradio/520c9fc797bf1d641565979d9f70954d7e4e0249f00cc100df7636b035d9b8c4/App Recording 20250408 1404.mp3: No such file or directory

This doesn’t solve the problem, but it might be related to what's causing it.

gebv avatar Apr 14 '25 10:04 gebv

When you upload a file via the WebUI, it gets cached automatically, and the WebUI uses the cached version rather than accessing the original source directly.

/tmp/gradio/520c9fc797bf1d641565979d9f70954d7e4e0249f00cc100df7636b035d9b8c4/App Recording 20250408 1404.mp3: No such file or directory

The error message says that the file could not be found in the temporary directory. It seems the temp file was removed somehow.

The most likely reason is that gradio automatically deleted it, since it was uploaded more than an hour ago.

The maximum cache age for uploaded files is currently set to 1 hour, as defined in this line.

https://github.com/jhj0517/Whisper-WebUI/blob/d70bcc79f4b4356cd6ebd145e09ee64aef9dd763/app.py#L27

Would it be better to increase than 1 hour?

jhj0517 avatar Apr 14 '25 14:04 jhj0517

From my tests, transcribing audio takes about 5 minutes per 1–1.5 hours of audio on average. So roughly, transcribing 11 to 16.5 hours of audio could take around 1 hour.

Then, when it gets to the diarization stage, it seems like the process accesses the file again — and that’s when it can crash with an error.

Would it be better to increase than 1 hour?

I think increasing the cache time would reduce the chances of long-running processes failing when they try to access the file again — especially if it's no longer in the cache at that point.

gebv avatar Apr 14 '25 15:04 gebv