whisper-cpp-python invalid model data and Error opening <_io.TextIOWrapper name='jfk.mp3' mode='r' encoding='UTF-8'>: Format not recognised.

code i m using

`from whisper_cpp_python import Whisper

whisper = Whisper(model_path="ggml-tiny.en.bin")

output = whisper.transcribe(open('jfk.mp3'))

print(output)

output = whisper.transcribe(open('jfk.mp3'), response_format='verbose_json')

print(output)`

i tried with 3 different version of python 3.11, 3.12 and 3.13

in 3.13 it didnt got installed but for 3.11 and 3.12 its showing

➜  stt_packages python3 app.py
whisper_init_from_file_no_state: loading model from 'ggml-tiny.en.bin'
whisper_model_load: loading model
whisper_model_load: invalid model data (bad magic)
whisper_init_no_state: failed to load model
Exception ignored from cffi callback <function SoundFile._init_virtual_io.<locals>.vio_read at 0x750c8c8aafc0>:
Traceback (most recent call last):
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 1300, in vio_read
    buf[0:data_read] = data
    ~~~^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'
Traceback (most recent call last):
  File "/home/x/stt_packages/app.py", line 3, in <module>
    output = whisper.transcribe(open('jfk.mp3'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/whisper_cpp_python/whisper.py", line 21, in transcribe
    data, sr = librosa.load(file, sr=Whisper.WHISPER_SR)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 186, in load
    raise exc
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 176, in load
    y, sr_native = __soundfile_load(path, offset, duration, dtype)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/librosa/core/audio.py", line 209, in __soundfile_load
    context = sf.SoundFile(path)
              ^^^^^^^^^^^^^^^^^^
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 690, in __init__
    self._file = self._open(file, mode_int, closefd)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/x/stt_packages/whisper_cpp_env/lib/python3.11/site-packages/soundfile.py", line 1265, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening <_io.TextIOWrapper name='jfk.mp3' mode='r' encoding='UTF-8'>: Format not recognised.
➜  stt_packages

facing this issue

Jan 15 '25 18:01 PlanetDestroyyer

I am facing the same issue. I am not so sure this repo is still active. From a quick dive into it, it seems that soundfile is out of date, and there seems to be a clash between the soundfile and the numpy versions used. I have not figured out the right balance, and I am not going to bother much more.

I can offer two things:

https://github.com/absadiki/pywhispercpp ==> last commit was 3 weeks ago, so seems more active than this here
I am currently building a minimal flask server around whisper.cpp (vulkan version for me, but could be anything). If you re interested in that, let me know and I can share it once done

Feb 06 '25 17:02 nicoKoehler

@nicoKoehler Thanks for responding i started using vosk instead of whisper.cpp its much better and totally works locally and on cpu

Feb 09 '25 08:02 PlanetDestroyyer

@PlanetDestroyyer cpu or gpu? Cause I couldnt find anything for vosk with AMD gpus (my use case). If you only require CPU then you could also use plain Whisper by openAI, since they will default to CPU with no GPU is recognized (or specified)

Feb 09 '25 16:02 nicoKoehler

@nicoKoehler vosk with cuda is there and i want to run on rpi so whisper.cpp is not best choice

Feb 09 '25 16:02 PlanetDestroyyer

@PlanetDestroyyer RPI = Raspberry pi? if so, how are you attaching the GPU? I have a similar use case, also wanted to get it running with rpi, but AMD gpus are even worse

Feb 09 '25 16:02 nicoKoehler

@nicoKoehler no gpu directly runing on cpu on rpi5 it works smoothly

Feb 09 '25 16:02 PlanetDestroyyer

@PlanetDestroyyer May I ask what performance you are getting? with my GPU in whisper.cpp I am getting 0.1 processing minute per audio minute. So a 10 minute file takes 1 minute to transcribe. When I was still on my i7 CPU it was more like 0.5 pm/am.

Feb 09 '25 17:02 nicoKoehler

Its almost real time with just 1 sec delay if u r on low sys like my sys is Ryzen 3 3250U 2core 4 threads 2.6 GHZ it's around 1.5 sec delay in real time transcription i would highly recommend you to try it once

Feb 09 '25 17:02 PlanetDestroyyer

From a quick dive into it, it seems that soundfile is out of date, and there seems to be a clash between the soundfile and the numpy versions used. I have not figured out the right balance, and I am not going to bother much more.

The reason is actually far simpler. Readme.md states you can use the open() function without any other parameters:

output = whisper.transcribe(open('jfk.mp3'))
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But that actually forces open() to the default read text mode, as stated here: https://docs.python.org/3/library/functions.html#open

'r' | open for reading (default)
'b' | binary mode
't' | text mode (default)

The default mode is 'r' (open for reading text, a synonym of 'rt').

In this mode, an implicit conversion of the raw bytes to UTF-8 text happens. Since sound files are mostly made up out of non-printable bytes, this step corrupts byte stream for the soundfile library.

Switching to binary mode fixes this issue:

output = whisper.transcribe(open('jfk.mp3', mode='rb'))

Mar 30 '25 11:03 kostirez1