whispercpp
                                
                                 whispercpp copied to clipboard
                                
                                    whispercpp copied to clipboard
                            
                            
                            
                        Pybind11 bindings for Whisper.cpp
whispercpp
Pybind11 bindings for whisper.cpp
Quickstart
Install from source:
pip install git+https://github.com/AIWintermuteAI/whispercpp.git -vv
Alternatively, git clone the develop branch of repository and initialize all submodules:
git submodule update --init --recursive
Then build the wheel:
[!IMPORTANT] If installing on Raspberry Pi OS (Lite, might apply to other images as well), you need to install some additional packages with apt-get:
sudo apt-get install libasound2-dev python3-dev python3-pip
# Option 1: using pypa/build
python3 -m build -w
# Option 2: using bazel
./tools/bazel build //:whispercpp_wheel
Afterwards, install the wheel:
# Option 1: via pypa/build
pip install dist/*.whl
# Option 2: using bazel
pip install $(./tools/bazel info bazel-bin)/*.whl
The binding provides a Whisper class:
from whispercpp import Whisper
w = Whisper.from_pretrained("tiny.en")
Currently, the inference API is provided via transcribe:
w.transcribe(np.ones((1, 16000)))
You can use any of your favorite audio libraries
(ffmpeg or
librosa, or
whispercpp.api.load_wav_file) to load audio files into a Numpy array, then
pass it to transcribe:
import ffmpeg
import numpy as np
try:
    y, _ = (
        ffmpeg.input("/path/to/audio.wav", threads=0)
        .output("-", format="s16le", acodec="pcm_s16le", ac=1, ar=sample_rate)
        .run(
            cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
        )
    )
except ffmpeg.Error as e:
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
arr = np.frombuffer(y, np.int16).flatten().astype(np.float32) / 32768.0
w.transcribe(arr)
You can also use the model transcribe_from_file for convience:
w.transcribe_from_file("/path/to/audio.wav")
The Pybind11 bindings supports all of the features from whisper.cpp, that takes inspiration from whisper-rs
The binding can also be used via api:
from whispercpp import api
# Binding directly fromn whisper.cpp
Development
See DEVELOPMENT.md
Official Builds
| Build Type | Status | Note | 
|---|---|---|
| Linux / MacOS Wheels | ||
| Unit tests | 
Examples
See examples for more information.
APIs
Whisper
- 
Whisper.from_pretrained(model_name: str) -> WhisperLoad a pre-trained model from the local cache or download and cache if needed. Supports loading a custom ggml model from a local path passed as model_name.w = Whisper.from_pretrained("tiny.en") w = Whisper.from_pretrained("/path/to/model.bin")The model will be saved to $XDG_DATA_HOME/whispercppor~/.local/share/whispercppif the environment variable is not set.
- 
Whisper.transcribe(arr: NDArray[np.float32], num_proc: int = 1)Running transcription on a given Numpy array. This calls fullfromwhisper.cpp. Ifnum_procis greater than 1, it will usefull_parallelinstead.w.transcribe(np.ones((1, 16000)))To transcribe from a WAV file use transcribe_from_file:w.transcribe_from_file("/path/to/audio.wav")
- 
Whisper.stream_transcribe(*, length_ms: int=..., device_id: int=..., num_proc: int=...) -> Iterator[str][EXPERIMENTAL] Streaming transcription. This calls stream_fromwhisper.cpp. The transcription will be yielded as soon as it's available. See stream.py for an example.Note: The device_idis the index of the audio device. You can usewhispercpp.api.available_audio_devicesto get the list of available audio devices.
api
api is a direct binding from whisper.cpp, that has similar API to
whisper-rs.
- 
api.ContextThis class is a wrapper around whisper_contextfrom whispercpp import api ctx = api.Context.from_file("/path/to/saved_weight.bin")Note: The context can also be accessed from the Whisperclass viaw.context
- 
api.ParamsThis class is a wrapper around whisper_paramsfrom whispercpp import api params = api.Params()Note: The params can also be accessed from the Whisperclass viaw.params
Why not?
- 
whispercpp.py. There are a few key differences here: - They provides the Cython bindings. From the UX standpoint, this achieves the
same goal as whispercpp. The difference iswhispercppuse Pybind11 instead. Feel free to use it if you prefer Cython over Pybind11. Note thatwhispercpp.pyandwhispercppare mutually exclusive, as they also use thewhispercppnamespace.
- whispercppprovides similar APIs as- whisper-rs, which provides a nicer UX to work with. There are literally two APIs (- from_pretrainedand- transcribe) to quickly use whisper.cpp in Python.
- whispercppdoesn't pollute your- $HOMEdirectory, rather it follows the XDG Base Directory Specification for saved weights.
 
- They provides the Cython bindings. From the UX standpoint, this achieves the
same goal as 
- 
Using cdllandctypesand be done with it?- This is also valid, but requires a lot of hacking and it is pretty slow comparing to Cython and Pybind11.
 
