invalid model data and Error opening <_io.TextIOWrapper name='jfk.mp3' mode='r' encoding='UTF-8'>: Format not recognised.
code i m using
`from whisper_cpp_python import Whisper
whisper = Whisper(model_path="ggml-tiny.en.bin")
output = whisper.transcribe(open('jfk.mp3'))
print(output)
output = whisper.transcribe(open('jfk.mp3'), response_format='verbose_json')
print(output)`
i tried with 3 different version of python 3.11, 3.12 and 3.13
in 3.13 it didnt got installed but for 3.11 and 3.12 its showing
➜ stt_packages python3 app.py
whisper_init_from_file_no_state: loading model from 'ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: invalid model data (bad magic)
whisper_init_no_state: failed to load model
Exception ignored from cffi callback <function SoundFile._init_virtual_io.<locals>.vio_read at 0x750c8c8aafc0>:
Traceback (most recent call last):
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 1300, in vio_read
buf[0:data_read] = data
~~~^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'
Traceback (most recent call last):
File "/home/x/stt_packages/app.py", line 3, in <module>
output = whisper.transcribe(open('jfk.mp3'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/whisper_cpp_python/whisper.py", line 21, in transcribe
data, sr = librosa.load(file, sr=Whisper.WHISPER_SR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 186, in load
raise exc
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 176, in load
y, sr_native = __soundfile_load(path, offset, duration, dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 209, in __soundfile_load
context = sf.SoundFile(path)
^^^^^^^^^^^^^^^^^^
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 690, in __init__
self._file = self._open(file, mode_int, closefd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 1265, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening <_io.TextIOWrapper name='jfk.mp3' mode='r' encoding='UTF-8'>: Format not recognised.
➜ stt_packages
facing this issue
I am facing the same issue. I am not so sure this repo is still active. From a quick dive into it, it seems that soundfile is out of date, and there seems to be a clash between the soundfile and the numpy versions used. I have not figured out the right balance, and I am not going to bother much more.
I can offer two things:
-
https://github.com/absadiki/pywhispercpp ==> last commit was 3 weeks ago, so seems more active than this here
-
I am currently building a minimal flask server around whisper.cpp (vulkan version for me, but could be anything). If you re interested in that, let me know and I can share it once done
@nicoKoehler Thanks for responding i started using vosk instead of whisper.cpp its much better and totally works locally and on cpu
@PlanetDestroyyer cpu or gpu? Cause I couldnt find anything for vosk with AMD gpus (my use case). If you only require CPU then you could also use plain Whisper by openAI, since they will default to CPU with no GPU is recognized (or specified)
@nicoKoehler vosk with cuda is there and i want to run on rpi so whisper.cpp is not best choice
@PlanetDestroyyer RPI = Raspberry pi? if so, how are you attaching the GPU? I have a similar use case, also wanted to get it running with rpi, but AMD gpus are even worse
@nicoKoehler no gpu directly runing on cpu on rpi5 it works smoothly
@PlanetDestroyyer May I ask what performance you are getting? with my GPU in whisper.cpp I am getting 0.1 processing minute per audio minute. So a 10 minute file takes 1 minute to transcribe. When I was still on my i7 CPU it was more like 0.5 pm/am.
Its almost real time with just 1 sec delay if u r on low sys like my sys is Ryzen 3 3250U 2core 4 threads 2.6 GHZ it's around 1.5 sec delay in real time transcription i would highly recommend you to try it once
From a quick dive into it, it seems that soundfile is out of date, and there seems to be a clash between the soundfile and the numpy versions used. I have not figured out the right balance, and I am not going to bother much more.
The reason is actually far simpler. Readme.md states you can use the open() function without any other parameters:
output = whisper.transcribe(open('jfk.mp3'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
But that actually forces open() to the default read text mode, as stated here: https://docs.python.org/3/library/functions.html#open
'r' | open for reading (default)
'b' | binary mode
't' | text mode (default)
The default mode is 'r' (open for reading text, a synonym of 'rt').
In this mode, an implicit conversion of the raw bytes to UTF-8 text happens. Since sound files are mostly made up out of non-printable bytes, this step corrupts byte stream for the soundfile library.
Switching to binary mode fixes this issue:
output = whisper.transcribe(open('jfk.mp3', mode='rb'))