whisper.cpp
whisper.cpp copied to clipboard
[Feature Request] Support openai/whisper "--initial_prompt" flag
I would love to add initial prompts to increase the accuracy of the transcription based on this: https://github.com/openai/whisper/discussions/66 How do I go about implement it here?
Many thanks!
The prompt_past
member of the context contains the tokens that have been decoded so far during the transcription:
https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.cpp#L2388-L2392
Currently, it is empty when we start processing an audio and grows up to whisper_n_text_ctx(ctx)/2
.
So to achieve the --initial_prompt
functionality, you need to tokenize the input text and pass it to the prompt_past
in the context. For example, the passing can be done by extending the C-interface with a whisper_set_prompt()
function similar to whisper_set_mel()
:
https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.h#L86-L96
And also extending the whisper_full_params
struct to accept a const char * prompt
string when no_context == false
:
https://github.com/ggerganov/whisper.cpp/blob/a6c786d5dccf5f54a92da890f70715b4c9831172/whisper.h#L168
Regarding the tokenization of the text - you can try to directly use the GPT tokenization function that I implemented here:
https://github.com/ggerganov/ggml/blob/624e4f531370d254b4e06268506f704524f57dc9/examples/utils.h#L53-L65
It's probably not 100% correct, but I found that it generally works.