speech_recognition
speech_recognition copied to clipboard
Bug report on Windows: ffmpeg Permission denied when using recognize_whisper (with proposed workaround)
Steps to reproduce
import speech_recognition as sr
r=sr.Recognizer()
with sr.Microphone() as m:
audio=r.record(m,duration=5)
r.recognize_whisper(audio)
Expected behaviour
I expected it to work
Actual behaviour
Throws a fmmpeg.Error Indicates that the permission has been denied, even though ffmpeg has the permission to access the temp directory (tested independently) Whisper also works when used as a standalone (when used directly, not through the speechRecogntion lib)
ffmpeg version 5.1.2-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
C:\Users\asus\Temp\tmpuqsd56vv.wav: Permission denied
System information
My system is Windows10x64.
My Python version is 3.10.9.
Identified Issue
in audio.py, the following line of the load_audio method is problematic under windows os:
ffmpeg.input(file, threads=0)
Indeed, the file fed into ffmpeg is actually the name of a temporaryFile.NamedTemporaryFile(), which is currently open (and can't really be closed, due to its temporary nature). The python docs specify:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
This problem can easily be missed if testing on Unix, but windows will not let you use temporary documents if you attempt to re-open them a second time (through ffmpeg.input in that case)
Proposed workaround:
I could get things to work by modifying init.py, from where the temporary file was passed in recognize_whisper Original code::
with tempfile.NamedTemporaryFile(suffix=".wav") as f:
f.write(audio_data.get_wav_data())
f.flush()
result = self.whisper_model[model].transcribe(
f.name,
language=language,
task="translate" if translate else None,
fp16=torch.cuda.is_available(),
**transcribe_options
)
I used a temporary directory instead in which I created a file that can then be closed prior to sending its name to the next function. This avoids conflicts between open files. It feels a bit backwards but it worked for me
with tempfile.TemporaryDirectory() as tempdir:
fname=tempdir+"\\atotallyrandomname.wav"
with open(fname,"wb") as f:
f.write(audio_data.get_wav_data())
f.flush()
result = self.whisper_model[model].transcribe(
fname,
language=language,
task="translate" if translate else None,
fp16=torch.cuda.is_available(),
**transcribe_options
)
HIT that too. Any chance to fix? Wanna remove monkeypatch
I was having a similar issue and I'm glad someone else had the same issue. I did test your solution and it did work. Thank you. Now whisper recognizes voice from my microphone.
This was fixed with #647, no temporary files anymore for whisper, all in memory.