whisper add hotwords feature

hello! During the transcription process, I often encounter some proprietary or new vocabulary, and Whisper cannot handle it well. I searched for solutions, and the community provided two options:

Fine-tuning the model: This approach is costly, and it's not practical to fine-tune the model every time a new term emerges.

Using initial_prompt: However, initial_prompt only applies to the first window. If specialized terms don't appear at the beginning, this method is ineffective.

Upon reviewing other transcription models, it's common practice to use hotwords. So, I implemented this feature. My approach is to add hotword-related prompts before each transcription window. Since there's a maximum length limit, I occupy the space previously used by the prefix. When the prefix isn't set, hotwords take effect. After testing, it indeed resolved the issue of specialized vocabulary in my scenario.

The following is the community discussion on this issue: https://github.com/openai/whisper/discussions/1477 https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311 https://stackoverflow.com/questions/73833916/how-can-i-give-some-hint-phrases-to-openais-whisper-asr

Mar 08 '24 03:03 jax-explorer

@jongwook hello， please check out this pr.

Mar 08 '24 07:03 jax-explorer

Would this be a duplicated effort since there is a parameter that serves the same purpose, condition_on_previous_text? if condition_on_previous_text set to True, the previous output of the model is provided as a prompt for the next window. Correct me if I'm wrong. Thank you.

Mar 08 '24 14:03 James-Shared-Studios

@James-Shared-Studios This isn't used to add context, it's used to add hot words when some new word or term comes up that makes whisper recognize it. for example:comfyUI is a new word, it is The most powerful and modular stable diffusion GUI and backend.If don't add hotwords, he won't be recognized correctly.

Mar 08 '24 17:03 jax-explorer

Have tried it with a video where the following words were misspelled

"Kalichain"
=>
"cl chain"
"cali chain"

"Kalicertif"
=>
"c cerff"
"cl ciff"
"Cali certif"

"Kalismarket"
=>
"C's Market"

"Kalishare"
=>
"Cali share"

"Kalistoken"
=>
"Cali's token"

"kijiji"
=>
"kiji"

And indeed it worked to make it so that these words were no longer misspelled with the following args:

whisper video.opus --hotwords "Kalichain, Kalicertif, Kalismarket, Kalishare, Kalistoken, kijiji, MEXC, Kalissa, FireHustle"

But it didn't work 100%, sometimes they were misspelled. Notably Kalicertif was misspelled as Kalistertif.

Apr 01 '24 12:04 greduan

So, by inputting a series of proper nouns through the hotwords method, what is the maximum length that can actually be supported? @jax-explorer

Apr 08 '24 05:04 JiweiZh

@JiweiZh It depends on the n_text_ctx value in the model's dims.

Apr 12 '24 06:04 jax-explorer

@jax-explorer Hello, I find this commit very useful and hope this going to be merged soon. Currently, I'm using your forked repository to enjoy this feature. BTW, I have some questions about your implementation.

You say that you occupy spaces for prefix, but I'm not sure where the prefix comes from. Is condition_on_previous_text related to prefix?
Current implementation divide n_ctx by 2 and assign prompt and hotwords evenly. If I want to use hotwords more, is it valid to change n_ctx // 2 to some other numbers? For example, I would not use prompt and use hotwords only if we provide hotwords like below:

if (hotwords := self.options.hotwords) is not None:
    hotwords_tokens = self.tokenizer.encode(" " + hotwords.strip())
    hotwords_tokens = hotwords_tokens[: self.n_ctx]  # Use more hotwords
    tokens = (
        [self.tokenizer.sot_prev]
        + hotwords_tokens
        # + (prompt_tokens[-(self.n_ctx // 2 - 1) :] if self.options.prompt is not None else [])
        + tokens
    )

Thanks!

Jul 01 '24 00:07 sanghyun-son

whisper whisper copied to clipboard

add hotwords feature

whisper
whisper copied to clipboard