WhisperKit icon indicating copy to clipboard operation
WhisperKit copied to clipboard

No Speech Detection

Open ZachNagengast opened this issue 1 year ago • 2 comments

This can be done with logit filters on the first loop, similar to detecting language. However, this cannot be used when we are using a prefill prompt (i.e. forced decoder tokens) so that will need special handling. Ideally, there'd be an option to ignore the prefill prompt for the first decoder loop to detect no speech, which costs 1 extra loop but may allow skipping the entire window if developers are expecting some long stretches of silence in their input audio.

References

Openai implementation: https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/decoding.py#L692-L693 WhisperKit inline todo: https://github.com/argmaxinc/WhisperKit/blob/228630c37e4ac1b1c95790d77f64058d317f8859/Sources/WhisperKit/Core/TextDecoder.swift#L497 https://github.com/argmaxinc/WhisperKit/blob/228630c37e4ac1b1c95790d77f64058d317f8859/Sources/WhisperKit/Core/WhisperKit.swift#L612-L616

ZachNagengast avatar Feb 16 '24 22:02 ZachNagengast

Hi, can I work on this issue?

aigerimmmm avatar Jun 19 '24 20:06 aigerimmmm

Absolutely! @aigerimmmm all yours

ZachNagengast avatar Jun 19 '24 22:06 ZachNagengast