whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

are calls to Whisper_full threadsafe ?

Open jaybinks opened this issue 2 years ago • 3 comments

if I load a model, and in multiple threads, create new whisper contexts, should you expect that I could execute whisper on the same model in separate threads?

Right now I'm doing this with the go bindings, so it may be an issue in the go binding, but I wanted to check if you had thought of this.

jaybinks avatar Dec 28 '22 07:12 jaybinks

when I try and run 2 threads over the same model, I get this :

Assertion failed: (ggml_can_repeat(a, b)), function ggml_repeat, file ggml.c, line 2205. SIGABRT: abort

jaybinks avatar Dec 28 '22 07:12 jaybinks

Silly me... line 34 of whisper.h The following interface is thread-safe as long as the sample whisper_context is not used by multiple threads concurrently.

I was hoping to get a speedup by only loading the model once, and then re-using the same in-memory model a number of times.

jaybinks avatar Dec 29 '22 06:12 jaybinks

There is actually a way to reuse the same context in multiple threads in parallel. The approach is demonstrated in the whisper_full_parallel() function:

https://github.com/ggerganov/whisper.cpp/blob/9a8ad3db697c628a87380d637f5eb0f72739a838/whisper.cpp#L3102-L3245

The newly created contexts reuse the model from the original context and only allocate memory for their own key-value memory tensors, which are significantly smaller compared to the entire model.

In this function, a single audio sample is divided in N equal parts and processed in parallel using just one model. But it can be easily modified to process different audio samples if that is the goal.

ggerganov avatar Dec 29 '22 12:12 ggerganov