whispercpp
whispercpp copied to clipboard
Bug: ERROR: Failed to initialized SDL: dsp: No such audio device
Describe the bug
Streaming issue. Can't find/list audio devices
To reproduce
Standard installation instructions
"""Some streaming examples."""
import os
import sys
import typing as t
import whispercpp_py as w
def main(**kwargs: t.Any):
kwargs.pop("list_audio_devices")
mname = kwargs.pop("model_name", os.getenv("GGML_MODEL", "tiny.en"))
iterator: t.Iterator[str] | None = None
try:
iterator = w.Whisper.from_pretrained(mname).stream_transcribe(**kwargs)
finally:
assert iterator is not None, "Something went wrong!"
sys.stderr.writelines(
["\nTranscription (line by line):\n"] + [f"{it}\n" for it in iterator]
)
sys.stderr.flush()
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--model_name", required=False)
parser.add_argument(
"--device_id", type=int, help="Choose the audio device", default=0
)
parser.add_argument(
"--length_ms",
type=int,
help="Length of the audio buffer in milliseconds",
default=5000,
)
parser.add_argument(
"--sample_rate",
type=int,
help="Sample rate of the audio device",
default=w.api.SAMPLE_RATE,
)
parser.add_argument(
"--n_threads",
type=int,
help="Number of threads to use for decoding",
default=8,
)
parser.add_argument(
"--step_ms",
type=int,
help="Step size of the audio buffer in milliseconds",
default=2000,
)
parser.add_argument(
"--keep_ms",
type=int,
help="Length of the audio buffer to keep in milliseconds",
default=200,
)
parser.add_argument(
"--max_tokens",
type=int,
help="Maximum number of tokens to decode",
default=32,
)
parser.add_argument("--audio_ctx", type=int, help="Audio context", default=0)
parser.add_argument(
"--list_audio_devices",
action="store_true",
default=False,
help="Show available audio devices",
)
args = parser.parse_args()
if args.list_audio_devices:
w.utils.available_audio_devices()
sys.exit(0)
main(**vars(args))
$ python3 stream.py --list_audio_devices
ERROR: Failed to initialized SDL: dsp: No such audio device
$ python3 stream.py --model_name ggml-base.en.bin
whisper_init_from_file_no_state: loading model from 'ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 2
whisper_model_load: mem required = 218.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.60 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
ERROR: Failed to initialized SDL: dsp: No such audio device
Traceback (most recent call last):
File "/home/acheong/.models/whisper_ggml/stream.py", line 15, in main
iterator = w.Whisper.from_pretrained(mname).stream_transcribe(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/acheong/venv/lib/python3.11/site-packages/whispercpp_py/__init__.py", line 256, in stream_transcribe
raise RuntimeError("Failed to initialize audio capture device.")
RuntimeError: Failed to initialize audio capture device.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/acheong/.models/whisper_ggml/stream.py", line 82, in <module>
main(**vars(args))
File "/home/acheong/.models/whisper_ggml/stream.py", line 17, in main
assert iterator is not None, "Something went wrong!"
^^^^^^^^^^^^^^^^^^^^
AssertionError: Something went wrong!
Expected behavior
(venv) [ 12:14AM ] [ acheong@InsignificantV3:~/.models/whisper_ggml/whisper.cpp(master✔) ]
$ ./stream -m ~/.models/whisper_ggml/ggml-base.en.bin -t 8 --step 500 --length 5000
init: found 1 capture devices:
init: - Capture device #0: 'Built-in Audio Analog Stereo'
init: attempt to open default capture device ...
init: obtained spec for input device (SDL Id = 2):
init: - sample rate: 16000
init: - format: 33056 (required: 33056)
init: - channels: 1 (required: 1)
init: - samples per frame: 1024
whisper_init_from_file_no_state: loading model from '/home/acheong/.models/whisper_ggml/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2
whisper_model_load: mem required = 310.00 MB (+ 6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx = 140.66 MB
whisper_model_load: model size = 140.54 MB
whisper_init_state: kv self size = 5.25 MB
whisper_init_state: kv cross size = 17.58 MB
main: processing 8000 samples (step = 0.5 sec / len = 5.0 sec / keep = 0.2 sec), 8 threads, lang = en, task = transcribe, timestamps = 0 ...
main: n_new_line = 9, no_context = 1
This is your... this more you.
(drum roll)
whisper_print_timings: load time = 85.85 ms
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: mel time = 1614.78 ms
whisper_print_timings: sample time = 293.89 ms / 431 runs ( 0.68 ms per run)
whisper_print_timings: encode time = 10957.55 ms / 8 runs ( 1369.69 ms per run)
whisper_print_timings: decode time = 1747.24 ms / 420 runs ( 4.16 ms per run)
whisper_print_timings: total time = 16279.01 ms
Environment
$ python -V
Python 3.11.2
acheong@InsignificantV3
-----------------------
OS: Ubuntu 23.04 x86_64
Host: Laptop AB
Kernel: 6.2.8-060208-generic
Uptime: 9 hours, 3 mins
Packages: 4237 (dpkg), 47 (nix-default), 14 (flatpak), 27 (snap)
Shell: zsh 5.9
Resolution: 2256x1504
DE: GNOME 44.0
WM: Mutter
WM Theme: WhiteSur-Dark
Theme: WhiteSur-Dark [GTK2/3]
Icons: WhiteSur-dark [GTK2/3]
Terminal: gnome-terminal
CPU: 11th Gen Intel i7-1165G7 (8) @ 4.700GHz
GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics]
Memory: 7047MiB / 15769MiB
what enviroment is ok to run ? in my pc it use core dump Eric@Eric-thurley:~/Downloads/whispercpp-0.0.17/examples/stream$ python3 stream.py --list_audio_devices Illegal instruction (core dumped)
I'm also experiencing this same issue regarding SDL2
https://github.com/ggerganov/whisper.cpp works so I assume it's a binding issue
im experiencing the same issue
See here https://github.com/AIWintermuteAI/whispercpp/issues/88#issuecomment-2237043595