whisper.cpp
whisper.cpp copied to clipboard
Pybind11 Issues
I have made binding for almost all of the functions and I am trying to get this working in Python but I am encountering errors. Some of this code is different such as paths for the sake of clarity.
`import whisperbind import sys from scipy.io import wavfile import numpy as np
class Whisper():
def __init__(self, model = None, params = None, print_ = False):
if model is None:
model_path = '/whisper.cpp/models/ggml-base.en.bin'
else:
model_path = f'/whisper.cpp/models/ggml-{model}.en.bin'
if params is None:
self.params = whisperbind.whisper_full_default_params(whisperbind.WHISPER_SAMPLING_GREEDY)
if not print_:
self.params.print_progress = False
self.context = whisperbind.whisper_init(model_path)
if not self.context:
print("Context is null")
sys.exit(0)
print(f"Context: {self.context}")
def full(self, samples):
samples_input = samples.astype(np.float32)
if samples_input.ndim == 2:
samples_input = samples_input[:,0]
n_samples = len(samples_input)
return whisperbind.whisper_full(self.context, self.params, samples_input, n_samples)
audio_path = "/whisper/tests/jfk.wav" sr, audio = wavfile.read(audio_path) whisper = Whisper() t = whisper.full(audio)`
When running the full function it seems to never return and breaks in the whisper encode function in C++ on the line ggml_graph_compute (ctxL, &gf);
based on print statement sI have added. I wanted to know if there is something I am missing when trying to call whisper_full or if there is an intermediate step that would cause this issue. Obviously, it might be necessary to see my pybind code, but there isn't much that can't be understood from the python code since most type conversions are automatic except for anything from numpy arrays to vectors/c arrays.
Hmm, looks OK overall - there isn't any extra step necessary.
Maybe try to normalize the PCM to be in [-1.0, 1.0]
as demonstrated here:
https://github.com/ggerganov/whisper.cpp/issues/9#issuecomment-1272555209
This example used to work before. Now the C API has changed and I haven't updated it, but the general idea is the same.
The other things you want to verify is that your self.params
structure correctly matches the C struct in whisper.h
.
Also, I had a bit of issues in the past when passing pointers (i.e. the context and the samples_input) - double-check that your code is doing it correctly.
@NebilI Did you achieve this in the end? Could you please share the whisperbind
package?
@o4dev Sorry for not responding. I was on vacation and this project was on hold. I have not made any progress on the bug. If you would like to help out with the endeavor let me know.
@ggerganov I finally picked this back up and it looks like the for loop in the function 'ggml_graph_compute' takes too long and changes the end condition. I have done the normalization as shown in the example you mentioned. I thought the gil might be an issue if this was threaded but I don't think it is. The for loop in the debugging section below also prints the elements as predicted Here is the pybind11 c++ code for the whisper_full function:
test.def("whisper_full", []( struct whisper_context * ctx, struct whisper_full_params params, py::array_t<float> &samples, int n_samples) { int response; int i; // cout << "\nIs this working???????\n"; py::buffer_info sample_buff = samples.request(); float* ptr1 = static_cast<float *>(sample_buff.ptr); // debugging for(i = 0; i < 30; i++) { cout << ptr1[i] << i << "\n"; } response = whisper_full(ctx, params, ptr1, n_samples); return response; }, "whisper_full");
I ended up exposing the api to python using cython instead as I needed to use it else where in my project anyway. I uploaded the rudimentary implementation of the wisper.cpp as a python package here: https://GitHub.com/o4dev/whispercpp.py
If it's of any use to you it can just be installed straight from pip using git (should handle whispercpp compilation too). However despite containing all the api's header definitions in cython, at present the main Whisper class just uses a few, to be able facilitate basic text retrieval. It does however handle virtually any audio type. Its functionally it very similar to the original whisper transcribe: downloads models automatically, converts with ffmpeg, etc
On Sat, 10 Dec 2022, 02:35 NebilI, @.***> wrote:
@ggerganov https://github.com/ggerganov I finally picked this back up and it looks like the for loop in the function 'ggml_graph_compute' takes too long and changes the end condition. I have done the normalization as shown in the example you mentioned. I thought the gil might be an issue if this was threaded but I don't think it is. The for loop in the debugging section below also prints the elements as predicted Here is the pybind11 c++ code for the whisper_full function:
' test.def("whisper_full", []( struct whisper_context * ctx, struct whisper_full_params params, py::array_t &samples, int n_samples) { int response; int i; // cout << "\nIs this working???????\n"; py::buffer_info sample_buff = samples.request(); float* ptr1 = static_cast<float *>(sample_buff.ptr); // debugging for(i = 0; i < 30; i++) { cout << ptr1[i] << i << "\n"; } response = whisper_full(ctx, params, ptr1, n_samples); return response; }, "whisper_full"); '
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/whisper.cpp/issues/180#issuecomment-1344986448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUI5VFMENUHORB7DUDGHT3WMPT7TANCNFSM6AAAAAASJWXB3E . You are receiving this because you were mentioned.Message ID: @.***>