whisper.cpp
whisper.cpp copied to clipboard
are calls to Whisper_full threadsafe ?
if I load a model, and in multiple threads, create new whisper contexts, should you expect that I could execute whisper on the same model in separate threads?
Right now I'm doing this with the go bindings, so it may be an issue in the go binding, but I wanted to check if you had thought of this.
when I try and run 2 threads over the same model, I get this :
Assertion failed: (ggml_can_repeat(a, b)), function ggml_repeat, file ggml.c, line 2205. SIGABRT: abort
Silly me... line 34 of whisper.h
The following interface is thread-safe as long as the sample whisper_context is not used by multiple threads concurrently.
I was hoping to get a speedup by only loading the model once, and then re-using the same in-memory model a number of times.
There is actually a way to reuse the same context in multiple threads in parallel.
The approach is demonstrated in the whisper_full_parallel() function:
https://github.com/ggerganov/whisper.cpp/blob/9a8ad3db697c628a87380d637f5eb0f72739a838/whisper.cpp#L3102-L3245
The newly created contexts reuse the model from the original context and only allocate memory for their own key-value memory tensors, which are significantly smaller compared to the entire model.
In this function, a single audio sample is divided in N equal parts and processed in parallel using just one model.
But it can be easily modified to process different audio samples if that is the goal.