speech_recognition icon indicating copy to clipboard operation
speech_recognition copied to clipboard

Bug report on Windows: ffmpeg Permission denied when using recognize_whisper (with proposed workaround)

Open julienbarbaud opened this issue 2 years ago • 3 comments

Steps to reproduce

 import speech_recognition as sr
 r=sr.Recognizer()
 with sr.Microphone() as m:
     audio=r.record(m,duration=5)

 r.recognize_whisper(audio)

Expected behaviour

I expected it to work

Actual behaviour

Throws a fmmpeg.Error Indicates that the permission has been denied, even though ffmpeg has the permission to access the temp directory (tested independently) Whisper also works when used as a standalone (when used directly, not through the speechRecogntion lib)

ffmpeg version 5.1.2-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
C:\Users\asus\Temp\tmpuqsd56vv.wav: Permission denied

System information

My system is Windows10x64.

My Python version is 3.10.9.

Identified Issue

in audio.py, the following line of the load_audio method is problematic under windows os: ffmpeg.input(file, threads=0) Indeed, the file fed into ffmpeg is actually the name of a temporaryFile.NamedTemporaryFile(), which is currently open (and can't really be closed, due to its temporary nature). The python docs specify:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

This problem can easily be missed if testing on Unix, but windows will not let you use temporary documents if you attempt to re-open them a second time (through ffmpeg.input in that case)

Proposed workaround:

I could get things to work by modifying init.py, from where the temporary file was passed in recognize_whisper Original code::

 with tempfile.NamedTemporaryFile(suffix=".wav") as f:
     f.write(audio_data.get_wav_data())
     f.flush()
     result = self.whisper_model[model].transcribe(
        f.name,
        language=language,
        task="translate" if translate else None,
        fp16=torch.cuda.is_available(), 
        **transcribe_options
        )

I used a temporary directory instead in which I created a file that can then be closed prior to sending its name to the next function. This avoids conflicts between open files. It feels a bit backwards but it worked for me

 with tempfile.TemporaryDirectory() as tempdir:
            fname=tempdir+"\\atotallyrandomname.wav"
            with open(fname,"wb") as f:
                f.write(audio_data.get_wav_data())
                f.flush()
            result = self.whisper_model[model].transcribe(
                fname,
                language=language,
                task="translate" if translate else None,
                fp16=torch.cuda.is_available(),
                **transcribe_options
            )

julienbarbaud avatar Feb 03 '23 21:02 julienbarbaud

HIT that too. Any chance to fix? Wanna remove monkeypatch

Ixiodor avatar Feb 09 '23 21:02 Ixiodor

I was having a similar issue and I'm glad someone else had the same issue. I did test your solution and it did work. Thank you. Now whisper recognizes voice from my microphone.

Quin4l avatar Feb 09 '23 22:02 Quin4l

This was fixed with #647, no temporary files anymore for whisper, all in memory.

nbrlt avatar Feb 19 '23 14:02 nbrlt