whispercpp
whispercpp copied to clipboard
Pybind11 bindings for Whisper.cpp
whispercpp 
Pybind11 bindings for whisper.cpp
Quickstart
Install with pip:
pip install whispercpp
NOTE: We will setup a hermetic toolchain for all platforms that doesn't have a prebuilt wheels, (which means you don't have to setup anything to install the Python package) which will take a bit longer to install. Pass
-vv
topip
to see the progress.
To use the latest version, install from source:
pip install git+https://github.com/aarnphm/whispercpp.git -vv
For local setup, initialize all submodules:
git submodule update --init --recursive
Build the wheel:
# Option 1: using pypa/build
python3 -m build -w
# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel
Install the wheel:
# Option 1: via pypa/build
pip install dist/*.whl
# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl
The binding provides a Whisper
class:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
Currently, the inference API is provided via transcribe
:
w.transcribe(np.ones((1, 16000)))
You can use any of your favorite audio libraries
(ffmpeg or
librosa, or
whispercpp.api.load_wav_file
) to load audio files into a Numpy array, then
pass it to transcribe
:
import ffmpeg
import numpy as np
try:
y, _ = (
ffmpeg.input("/path/to/audio.wav", threads=0)
.output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
.run(
cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
)
)
except ffmpeg.Error as e:
raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
w.transcribe(arr)
You can also use the model transcribe_from_file
for convience:
w.transcribe_from_file("/path/to/audio.wav")
The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs
The binding can also be used via api
:
from whispercpp import api
# Binding directly fromn whisper.cpp
Development
See DEVELOPMENT.md
APIs
Whisper
-
Whisper.from_pretrained(model_name: str) -> Whisper
Load a pre-trained model from the local cache or download and cache if needed. Supports loading a custom ggml model from a local path passed as
model_name
.w = Whisper.from_pretrained("tiny.en") w = Whisper.from_pretrained("/path/to/model.bin")
The model will be saved to
$XDG_DATA_HOME/whispercpp
or~/.local/share/whispercpp
if the environment variable is not set. -
Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)
Running transcription on a given Numpy array. This calls
full
fromwhisper.cpp
. Ifnum_proc
is greater than 1, it will usefull_parallel
instead.w.transcribe(np.ones((1, 16000)))
To transcribe from a WAV file use
transcribe_from_file
:w.transcribe_from_file("/path/to/audio.wav")
-
Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str]
[EXPERIMENTAL] Streaming transcription. This calls
stream_
fromwhisper.cpp
. The transcription will be yielded as soon as it's available. See stream.py for an example.Note: The
device_id
is the index of the audio device. You can usewhispercpp.api.available_audio_devices
to get the list of available audio devices.
api
api
is a direct binding from whisper.cpp
, that has similar API to
whisper-rs
.
-
api.Context
This class is a wrapper around
whisper_context
from whispercpp import api ctx = api.Context.from_file("/path/to/saved_weight.bin")
Note: The context can also be accessed from the
Whisper
class viaw.context
-
api.Params
This class is a wrapper around
whisper_params
from whispercpp import api params = api.Params()
Note: The params can also be accessed from the
Whisper
class viaw.params
Why not?
-
whispercpp.py. There are a few key differences here:
- They provides the Cython bindings. From the UX standpoint, this achieves the
same goal as
whispercpp
. The difference iswhispercpp
use Pybind11 instead. Feel free to use it if you prefer Cython over Pybind11. Note thatwhispercpp.py
andwhispercpp
are mutually exclusive, as they also use thewhispercpp
namespace. -
whispercpp
provides similar APIs aswhisper-rs
, which provides a nicer UX to work with. There are literally two APIs (from_pretrained
andtranscribe
) to quickly use whisper.cpp in Python. -
whispercpp
doesn't pollute your$HOME
directory, rather it follows the XDG Base Directory Specification for saved weights.
- They provides the Cython bindings. From the UX standpoint, this achieves the
same goal as
-
Using
cdll
andctypes
and be done with it?- This is also valid, but requires a lot of hacking and it is pretty slow comparing to Cython and Pybind11.
Examples
See examples for more information