whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

[Feature Request] Support openai/whisper "--initial_prompt" flag

Open ninjalu opened this issue 1 year ago • 1 comments

I would love to add initial prompts to increase the accuracy of the transcription based on this: https://github.com/openai/whisper/discussions/66 How do I go about implement it here?

Many thanks!

ninjalu avatar Oct 25 '22 20:10 ninjalu

The prompt_past member of the context contains the tokens that have been decoded so far during the transcription:

https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.cpp#L2388-L2392

Currently, it is empty when we start processing an audio and grows up to whisper_n_text_ctx(ctx)/2.

So to achieve the --initial_prompt functionality, you need to tokenize the input text and pass it to the prompt_past in the context. For example, the passing can be done by extending the C-interface with a whisper_set_prompt() function similar to whisper_set_mel():

https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.h#L86-L96

And also extending the whisper_full_params struct to accept a const char * prompt string when no_context == false:

https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.h#L168

Regarding the tokenization of the text - you can try to directly use the GPT tokenization function that I implemented here:

https://github.com/ggerganov/ggml/blob/624e4f531370d254b4e06268506f704524f57dc9/examples/utils.h#L53-L65

It's probably not 100% correct, but I found that it generally works.

ggerganov avatar Oct 26 '22 06:10 ggerganov