Whisper Add an option for a prompt

Could it be possible please?

Mar 25 '23 22:03 User1231300

Update: it was added in the official whisper.cpp

Mar 31 '23 18:03 User1231300

@User1231300 Added in the API, command-line examples, and power shell. However, I don’t know how that thing should affect the output, haven’t tested much.

Apr 03 '23 09:04 Const-me

@User1231300 Added in the API, command-line examples, and power shell. However, I don’t know how that thing should affect the output, haven’t tested much.

It helps a little at providing context for short clips in my experience. For example if you're doing a knock knock joke: Knock Knock Who's there? Orange. Orange who?

If you send a clip of "Orange who?" to the model with no prompt it will often think the word 'orange' is something else, because that phrase makes no sense without knowing the context. But if the prompt says that the context of the previous reply was "Orange", it will get it right.

As an aside, I think there might be something slightly wrong with the new release binaries you just put up, because they are much slower than the previous ones. With the previous ones, 5 seconds of audio with ggml-medium on my RTX 3080 takes 1.6 seconds. With the new binaries you just uploaded, it takes 2.5 seconds.

Apr 03 '23 10:04 clockworkwhale

@User1231300 Added in the API, command-line examples, and power shell. However, I don’t know how that thing should affect the output, haven’t tested much.

Thank you very much. i will check it out and come back to you on that.

It helps a TON with non-english languages to provide context, in order to avoid it looping over a sentence.

Apr 03 '23 19:04 User1231300

I'm sorry but i'm trying really hard to understand but I can't. How do i give a prompt? I tried running whisperdesktop.exe from a terminal and adding -h after to see if that's how i give it parameters.. i'm kinda lost

Apr 03 '23 20:04 User1231300

I also installed it on powershell but still can't figure out how to add a prompt

Apr 03 '23 20:04 User1231300

update: i might be figuring it out

Apr 03 '23 20:04 User1231300

Thank you @User1231300 @Const-me for adding prompt feature.

The initial prompt gives whisper a context about the material, to generate more acurate result, for example you can tell it the exact names or special brand names that will appear in the audio, so that it won't spell wrong name.

Another usage is for non-english recognition, in my case, the Chinese language has two form: the Traditional Chinese (eg. 歡迎光臨), and the Simplified Chinese (eg. 欢迎光临), whisper sees these two forms as one language, so it might output Traditional Chinese when the user want Simplified Chinese.

I have tested the cli release, it works nice, I can use --prompt "繁體中文" to let it output Traditional Chinese result:

1
00:00:00,000 --> 00:00:06,000
陳寗現在上課,我現在錄一段,剛才我找到整個解決過程

2
00:00:06,000 --> 00:00:17,000
(音樂)

and by using --prompt "简体中文", the cli will output Simplified Chinese result:

1
00:00:00,000 --> 00:00:06,000
趁你现在上课,我现在录一段,刚才我找到整个解决过程

2
00:00:06,000 --> 00:00:16,000
(音声)

The last small issue, thought this feature is added, the help page don't have the --prompt item @Const-me :

usage: main [options] file0.wav file1.wav ...

options:
  -h,       --help          [default] show this help message and exit
  -la,      --list-adapters List graphic adapters and exit
  -gpu,     --use-gpu       The graphic adapter to use for inference
  -t N,     --threads N     [4      ] number of threads to use during computation
  -p N,     --processors N  [1      ] number of processors to use during computation
  -ot N,    --offset-t N    [0      ] time offset in milliseconds
  -on N,    --offset-n N    [0      ] segment index offset
  -d  N,    --duration N    [0      ] duration of audio to process in milliseconds
  -mc N,    --max-context N [-1     ] maximum number of text context tokens to store
  -ml N,    --max-len N     [0      ] maximum segment length in characters
  -wt N,    --word-thold N  [0.01   ] word timestamp probability threshold
  -su,      --speed-up      [false  ] speed up audio by x2 (reduced accuracy)
  -tr,      --translate     [false  ] translate from source language to english
  -di,      --diarize       [false  ] stereo audio diarization
  -otxt,    --output-txt    [false  ] output result in a text file
  -ovtt,    --output-vtt    [false  ] output result in a vtt file
  -osrt,    --output-srt    [false  ] output result in a srt file
  -owts,    --output-words  [false  ] output script for generating karaoke video
  -ps,      --print-special [false  ] print special tokens
  -nc,      --no-colors     [false  ] do not print colors
  -nt,      --no-timestamps [false  ] do not print timestamps
  -l LANG,  --language LANG [en     ] spoken language
  -m FNAME, --model FNAME   [models/ggml-base.en.bin] model path
  -f FNAME, --file FNAME    [       ] path of the input audio file

Apr 05 '23 16:04 HaujetZhao

Could somebody help me with an example command for how to use this version of whisper from powershell? I really need prompts to make sure things like punctuation don't come off the rails as explained here:

https://github.com/openai/whisper/discussions/194#discussioncomment-3766522

It would be really nice to get this in the UI alongside being able to queue up multiple files, otherwise this version has been amazing.

May 29 '23 08:05 Hentaisocial