RealtimeSTT
RealtimeSTT copied to clipboard
unable to run script
Hi there,
I've been desperate to try your script after I saw it on reddit (we had a brief chat), but I can't for the life of me figure out what's going on?
I've tried: Running from the GH repo with pip install realtimestt Running from the GH repo without pip install realtimestt running in a different env just using pip install realtimestt running your test scripts running the most 'basic' vanilla script
Environment: MacBook Pro macOS Ventura Version 13.5.1 (22G90) Apple M2 Max Conda Environment (fresh) ffmpeg installed with Conda Python 3.11.5 Pip freeze dump: av==10.0.0 certifi==2023.7.22 charset-normalizer==3.2.0 colorama==0.4.6 coloredlogs==15.0.1 ctranslate2==3.19.0 enum34==1.1.10 faster-whisper==0.8.0 filelock==3.12.4 flatbuffers==23.5.26 fsspec==2023.9.1 halo==0.0.31 huggingface-hub==0.17.1 humanfriendly==10.0 idna==3.4 Jinja2==3.1.2 log-symbols==0.0.14 MarkupSafe==2.1.3 mpmath==1.3.0 networkx==3.1 numpy==1.25.2 onnxruntime==1.15.1 packaging==23.1 protobuf==4.24.3 pvporcupine==1.9.5 PyAudio==0.2.13 PyYAML==6.0.1 requests==2.31.0 six==1.16.0 spinners==0.0.24 sympy==1.12 termcolor==2.3.0 tokenizers==0.13.3 torch==2.0.1 torchaudio==2.0.2 tqdm==4.66.1 typing_extensions==4.7.1 urllib3==2.0.4 webrtcvad==2.0.10
Console dump:
[ctranslate2] [thread 2542417] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
File "/.../test whisper.py", line 4, in
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 246, in init self.silero_vad_model, _ = torch.hub.load(
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/torch/hub.py", line 555, in load repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, trust_repo, "load",
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/torch/hub.py", line 199, in _get_cache_or_reload repo_owner, repo_name, ref = _parse_repo_info(github)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/torch/hub.py", line 142, in _parse_repo_info with urlopen(f"https://github.com/{repo_owner}/{repo_name}/tree/main/"):
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 519, in open response = self._open(req, data)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol +
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 496, in _call_chain result = func(*args)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req,
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/urllib/request.py", line 1352, in do_open r = h.getresponse()
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/http/client.py", line 1378, in getresponse response.begin()
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/http/client.py", line 318, in begin version, status, reason = self._read_status()
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/http/client.py", line 279, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/socket.py", line 706, in readinto return self._sock.recv_into(b)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/ssl.py", line 1278, in recv_into return self.read(nbytes, buffer)
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/ssl.py", line 1134, in read return self._sslobj.read(len, buffer)
KeyboardInterrupt Exception ignored in: <function AudioToTextRecorder.del at 0x1523b23e0> Traceback (most recent call last): File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 894, in del self.shutdown()
File "/.../anaconda3/envs/open-interpreter/lib/python3.11/site-packages/RealtimeSTT/audio_recorder.py", line 397, in shutdown self.recording_thread.join()
AttributeError: 'AudioToTextRecorder' object has no attribute 'recording_thread'
Would love some help here!
Thanks,
The Captain
Hey there, thanks for trying out my stuff and helping to get rid of those early annoyances most fresh libraries have.
It looks like the main problem is that loading the Silero VAD Model fails. The CTranslate warning (due to MacBook and CPU inference) and the other stuff (silero exception happens in constructor and i don't handle that well in the shutdown) should be no real problems.
Could you please try to run some minimal code, only the silero model loading part, to see if the issue persists?
import torch
model, _ = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad", verbose=True)
Thanks for the reply, and I'll happily do just about anything - this is just what I've been looking for - hit me up for any test you fancy!
Running just that code I got the following in the console:
Using cache found in /Users/.cache/torch/hub/snakers4_silero-vad_master
Just in case it's useful, if I don't catch the error in the 1st 1/10th of a second, then I get hundreds of console logs as per the following.
`OSError: [Errno -9988] Stream closed
Error: [Errno -9988] Stream closed
RealTimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
Traceback: Traceback (most recent call last):
File ".../GitHub/RealtimeSTT/RealtimeSTT/audio_recorder.py", line 594, in _recording_worker
data = self.stream.read(self.buffer_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../anaconda3/envs/whisper/lib/python3.11/site-packages/pyaudio/init.py", line 570, in read
return pa.read_stream(self._stream, num_frames,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno -9988] Stream closed
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='
Current thread 0x00000001ea8c2080 (most recent call first): <no Python frame>
Extension modules: pyaudio._portaudio, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.enum, av.error, av.utils, av.option, av.descriptor, av.container.pyio, av.dictionary, av.format, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.audio.fifo, av.filter.pad, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.audio.resampler, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, yaml._yaml, charset_normalizer.md, _webrtcvad (total: 65) zsh: abort `
Thanks for the description and sorry for the inconveniences. Looks to me like Silero VAD loads using minimal code but fails loading when you use RealtimeSTT. Can't say that for sure, since it should throw an exception and print a logging message in that case, which i do not see in the output. I haven't worked with a MacBook setup before, but I'll do my best to help troubleshooting. Some ideas:
-
Are the minimal code and RealtimeSTT running in the same Anaconda environment? If not, please make sure they are.
-
RealtimeSTT is basically a single Python file (
audio_recorder.py
). You could try to download this file directly from the GitHub repository and run your code against this file. This helps rule out any issues torch.hub.load could have with thepip
or Anaconda installation.
https://github.com/KoljaB/RealtimeSTT/blob/master/RealtimeSTT/audio_recorder.py
-
Try to run the
AudioToTextRecorder
class with thelevel=logging.DEBUG
parameter (hopefully we see the logging), maybe we then see more details about what goes wrong. -
Try creating a new Anaconda environment solely for RealtimeSTT (i see open-interpreter in your URL - i do not think it is the case but maybe some libs do collide)
For whatever reason I always download the full GH repo - that's what I meant when I said that I'd tried it with and without the pip install realtimestt in my original message. I tried it directly from the repo just in case. I'd also run it inside an entirely vanilla brand-new environment with only realtimesst and it's dependencies. Of course that doesn't mean that I did it right, but I'm certainly trying to (not a coder!).
I put a wrapper around the script:
` import subprocess import sys
def run_script(script_path): process = subprocess.Popen(['python', script_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
while True:
output = process.stderr.readline()
if output:
output = output.decode('utf-8') # Add this line here
print(output.strip())
if "Error during recording: [Errno -9988] Stream closed" in output.strip():
process.terminate()
print("Script terminated due to error")
sys.exit(1)
if output == '' and process.poll() is not None:
break
rc = process.poll()
return rc
run_script('/Users/GitHub/RealtimeSTT/test.py') `
Here's the test.py script, with the requested debug log level:
`from RealtimeSTT import AudioToTextRecorder import logging
logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s') recorder = AudioToTextRecorder(level=logging.DEBUG)
print("Say something...")
while (True): print(recorder.text(), end=" ", flush=True)
`
And here's the output from the console, this time it doesn't have all the forced 'break' info, so hopefully it will be more useful?
.../RealtimeSTT/realtimestt_wrapper.py b'[2023-09-21 21:35:52.870] [ctranslate2] [thread 191083] [warning] The compute type inferred from the saved model is >float16, but the target device or backend do not support efficient float16 computation. The model weights have been >automatically converted to use the float32 compute type instead.'
There's no other info in the console
Here's the log I saved:
root - WARNING - Input overflowed. Frame dropped. root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed root - ERROR - Error during recording: [Errno -9988] Stream closed
repeat this a few more times...
I wasn't getting anything from the debug, so i had a bit of a poke around in your code (hope you don't mind!)
The realtimestt folder from the repo was read only, so I had to change that, and I uncommented the file name so it would save the file.
I was still not getting anything, so I moved the logging statement to the top of the page, straight after the imports, and before the init.
Here's what I got:
`RealtimeSTT: root - INFO - AudioToTextRecorder object created RealtimeSTT: root - INFO - AudioToTextRecorder object created RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443 RealtimeSTT: urllib3.connectionpool - DEBUG - https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812 RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3 RealtimeSTT: torchaudio._extension - DEBUG - Failed to initialize ffmpeg bindings Traceback (most recent call last): File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 85, in _init_ffmpeg _load_lib("libtorchaudio_ffmpeg") File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib torch.ops.load_library(path) File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torch/_ops.py", line 643, in load_library ctypes.CDLL(path) File "/Users/anaconda3/envs/realtimestt/lib/python3.11/ctypes/init.py", line 376, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: dlopen(/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so, 0x0006): Library not loaded: @rpath/libavdevice.58.dylib Referenced from: <00D3B28A-9088-32CE-B641-F43D64502379> /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so Reason: tried: '/Users/anaconda3/envs/realtimestt/lib/python3.11/lib-dynload/../../libavdevice.58.dylib' (no such file), '/Users/anaconda3/envs/realtimestt/bin/../lib/libavdevice.58.dylib' (no such file), '/usr/local/lib/libavdevice.58.dylib' (no such file), '/usr/lib/libavdevice.58.dylib' (no such file, not in dyld cache)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/init.py", line 67, in
except...
==> Downloading https://formulae.brew.sh/api/formula.jws.json ################################################################################################################# 100.0% ==> Downloading https://formulae.brew.sh/api/cask.jws.json ################################################################################################################# 100.0% Warning: ffmpeg 6.0_1 is already installed and up-to-date.
I read somewhere that torchaudio is incompatible to some ffmpeg versions. Can you check your ffmpeg version in a terminal with ffmpeg -version? I think ffmpeg 4.4 is maximal official supported for torchaudio, maybe it's worth trying to downgrade to that version.
I've installed FFMPEG v 4.4.4 via brew install ffmpeg@4 >> https://formulae.brew.sh/formula/ffmpeg@4 "ffmpeg@4 4.4.4 is installed and up-to-date."
FFMPEG -Version gives the following output: ffmpeg -version ffmpeg version 4.4.4 Copyright (c) 2000-2023 the FFmpeg developers built with Apple clang version 14.0.3 (clang-1403.0.22.14.1) configuration: --prefix='/opt/homebrew/Cellar/ffmpeg@4/4.4.4' --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-avresample --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100
When I installed FFMEG I got this message:
ffmpeg@4 is keg-only, which means it was not symlinked into /opt/homebrew, because this is an alternate version of another formula.
If you need to have ffmpeg@4 first in your PATH, run: echo 'export PATH="/opt/homebrew/opt/ffmpeg@4/bin:$PATH"' >> ~/.zshrc ? ?For compilers to find ffmpeg@4 you may need to set: export LDFLAGS="-L/opt/homebrew/opt/ffmpeg@4/lib" export CPPFLAGS="-I/opt/homebrew/opt/ffmpeg@4/include"
For pkg-config to find ffmpeg@4 you may need to set: export PKG_CONFIG_PATH="/opt/homebrew/opt/ffmpeg@4/lib/pkgconfig"
I've created symbolic links: lrwxr-xr-x@ 1 user staff 59 Sep 24 00:17 libavcodec.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavcodec.58.dylib lrwxr-xr-x@ 1 user staff 60 Sep 24 00:10 libavdevice.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavdevice.58.dylib lrwxr-xr-x@ 1 user staff 59 Sep 24 00:16 libavfilter.7.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavfilter.7.dylib lrwxr-xr-x@ 1 user staff 60 Sep 24 00:16 libavformat.58.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavformat.58.dylib lrwxr-xr-x@ 1 user staff 58 Sep 24 00:17 libavutil.56.dylib -> /opt/homebrew/Cellar/ffmpeg@4/4.4.4/lib/libavutil.56.dylib
I've exported paths and such: export DYLD_LIBRARY_PATH=/Users/anaconda3/envs/realtimestt/lib/python3.11/:$DYLD_LIBRARY_PATH echo 'export PATH="/opt/homebrew/opt/ffmpeg@4/bin:$PATH"' >> ~/.zshrc echo 'export LDFLAGS="-L/opt/homebrew/opt/ffmpeg@4/lib"' >> ~/.zshrc echo 'export CPPFLAGS="-I/opt/homebrew/opt/ffmpeg@4/include"' >> ~/.zshrc echo 'export PKG_CONFIG_PATH="/opt/homebrew/opt/ffmpeg@4/lib/pkgconfig"' >> ~/.zshrc reloaded the ZSH (source ~/.zshrc), then reloaded the window in vscode to make sure it was all updated.
(Chat GPT told me what to do and how to do it when I fed the error message in the log into it)
Here's the log... RealtimeSTT: root - INFO - AudioToTextRecorder object created RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443 RealtimeSTT: urllib3.connectionpool - DEBUG - https:/huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812 RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3 RealtimeSTT: torchaudio._extension - DEBUG - Failed to initialize ffmpeg bindings Traceback (most recent call last): File "...anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 85, in _init_ffmpeg _load_lib("libtorchaudio_ffmpeg") File ".../anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 61, in _load_lib torch.ops.load_library(path) File "...anaconda3/envs/realtimestt/lib/python3.11/site-packages/torch/_ops.py", line 643, in load_library ctypes.CDLL(path) File ".../anaconda3/envs/realtimestt/lib/python3.11/ctypes/init.py", line 376, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: dlopen(/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so, 0x0006): Library not loaded: @rpath/libavdevice.58.dylib Referenced from: <00D3B28A-9088-32CE-B641-F43D64502379> /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so Reason: tried: '/Users/anaconda3/envs/realtimestt/lib/python3.11/lib-dynload/../../libavdevice.58.dylib' (no such file), '/Users/anaconda3/envs/realtimestt/bin/../lib/libavdevice.58.dylib' (no such file), '/usr/local/lib/libavdevice.58.dylib' (no such file), '/usr/lib/libavdevice.58.dylib' (no such file, not in dyld cache)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/torchaudio/_extension/init.py", line 67, in
I am currently at a bit of a loss?
I don't' understand this documentation, but perhaps it will be a bit of use? >> https://pytorch.org/audio/main/_modules/torchaudio/utils/ffmpeg_utils.html
After doing a load of testing I have found that if I remove the torchaudio from the requirements.txt file, then install the ffmpeg and the torchaudio via the Conda Pytorch channel then I can overcome the bindings problem.
I install them with
conda install -c pytorch torchaudio ffmpeg
This then installs both packages with their dependencies. It installs ffmpeg version 4.2.2 libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100
and
Name Version Build Channel
torchaudio 2.0.2 py311_cpu pytorch
Now my log file is a BIT easier...
RealtimeSTT: root - INFO - AudioToTextRecorder object created RealtimeSTT: urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): huggingface.co:443 RealtimeSTT: urllib3.connectionpool - DEBUG - https://huggingface.co:443 "GET /api/models/guillaumekln/faster-whisper-tiny/revision/main HTTP/1.1" 200 1812 RealtimeSTT: root - INFO - Initializing WebRTC voice with Sensitivity 3 RealtimeSTT: root - INFO - _recording_worker method called RealtimeSTT: root - DEBUG - Starting recording worker RealtimeSTT: root - DEBUG - Starting realtime worker RealtimeSTT: root - DEBUG - Constructor finished RealtimeSTT: root - WARNING - Input overflowed. Frame dropped. RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
The console output is the part I'm currently having difficulty with:
objc[13119]: Class AVFFrameReceiver is implemented in both /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x10367c778) and >/Users/anaconda3/envs/realtimestt/lib/libavdevice.58.8.100.dylib (0x13a4f0798). One of the two will be used. Which one is undefined. objc[13119]: Class AVFAudioReceiver is implemented in both /Users/anaconda3/envs/realtimestt/lib/python3.11/site-packages/av/.dylibs/libavdevice.59.7.100.dylib (0x10367c7c8) and >/Users/anaconda3/envs/realtimestt/lib/libavdevice.58.8.100.dylib (0x13a4f07e8). One of the two will be used. Which one is undefined.
No matter which version I remove using rm, I end up with a situation that it crashes with a console output telling me that the file that it's looking for can't be found....
I'm sure there's some obvious way around this, but I don't know what it is.
Apologies if the updates are spamming you, but I grab the time as and when I can ( kids ;-) ), so I'm noting it both as an aide-de-memoir, and also in case I don't finish what I'm working on so that you have some idea of what 'progress' or otherwise is being made in case I don't get back to it for a day or two.
Thanks for your detailled feedback and your patience. Unfortunately I am also very lost on the conflicting versions of the shared libraries. This is so much environment / MacBook related and I lack experience with apple products and deployment to really give helpful advice here. It seems quite strange that using a clean conda install in a new environment causes such library conflicts.
What I do see in the log files is that something with the pyAudio stream is going wrong:
RealtimeSTT: root - WARNING - Input overflowed. Frame dropped. RealtimeSTT: root - ERROR - Error during recording: [Errno -9988] Stream closed
The first error means a pyaudio.paInputOverflowed exception was raised, indicating that audio samples were dropped from the input stream. Probably as a consequence of that the consequence the stream gets closed.
I am currently unsure, why this exception gets raised. I would suggest doing a very basic test of your pyAudio installation like this:
import pyaudio
class SimpleAudioRecorder:
def __init__(self):
self.rate = 16000
self.format = pyaudio.paInt16
self.channels = 1
self.input = True
self.buffer_size = 512
self.pa = pyaudio.PyAudio()
self.stream = self.pa.open(rate=self.rate,
format=self.format,
channels=self.channels,
input=self.input,
frames_per_buffer=self.buffer_size)
def record(self):
print("Recording for 5 seconds...")
frames = []
for _ in range(0, int(self.rate / self.buffer_size * 5)):
try:
data = self.stream.read(self.buffer_size)
frames.append(data)
except IOError as e:
print(f"Error recording data: {e}")
print("Recording complete!")
return b''.join(frames)
def close(self):
self.stream.stop_stream()
self.stream.close()
self.pa.terminate()
if __name__ == '__main__':
recorder = SimpleAudioRecorder()
audio_data = recorder.record()
recorder.close()
Since it uses the same pyAudio logic as RealtimeSTT it should fail too and if it does, we have the pyAudio issue pinpointed down to simple demo code and can focus better on getting rid of it.
Sorry for the delay in getting back to you.
That code snippet worked just fine...
Recording for 5 seconds... Recording complete!
I wanted to make sure that it was actually recording, so added the following:
with wave.open(filename, 'wb') as wf: wf.setnchannels(self.channels) wf.setsampwidth(self.pa.get_sample_size(self.format)) wf.setframerate(self.rate) wf.writeframes(b''.join(frames))
It output the file, and it was fine as a recording.
Ok, I finally start to get a grasp of what is happening. Sorry for all the issues.
You get a pyaudio.paInputOverflowed exception and your stream works, I think the reason must be that the script does not call the read() method fast enough to consume the incoming audio data, causing the buffer to fill up and overflow. And the only thing in that loop that really takes time to process is the WebRTC voice activity detection I do after reading from the stream. I expected this to be fast enough to do it in the stream loop, but turns out it isn't.
So I need to update the library and perform the WebRTC voice activity detection in another thread like I already do it with SileroVAD. But this forces me to redesign some things, I also rely on WebRTC when detecting end of speech and I need a clean recording worker.
So - as I need to redesign some things anyway, I think I should also switch the main transcription logic from multithreading to multiprocessing. The current implementation isn't perfect in the occasions where VAD and transcription are done in parallel only in threads. I guess pythons global interpreter lock makes them not interfer smoothly.
I will think about all that for a while. Then I will do a new release with reworked and hopefully more solid recording, VAD and transcription. Will take me something between one and three weeks I guess.
Just released a new version with a separated recording process. I really, really hope that this will solve your problem too (can't promise ofc). Maybe you can give it a try, I would love to hear some feedback.
Edit: you need to update your client code and include if name == 'main': protection due to the multiprocessing update. The files in the test directory are already all updated, please look at the new realtimestt_test.py file