leon icon indicating copy to clipboard operation
leon copied to clipboard

[Feature request] Implement OpenAI Whisper for TTS Option

Open pablogranolabar opened this issue 2 years ago • 6 comments

Feature Use Case

Implement OpenAI Whisper ASR for SOTA TTS and wakeword triggers.

Feature Proposal

OpenAI recently released Whisper, a SOTA ASR model. Recent development on Whisper include third party model implementations which support distilled model weights and reduced precision inference, sufficient to support Whisper on CPU platforms.

pablogranolabar avatar Nov 13 '22 18:11 pablogranolabar

Interesting, thanks. Added to this roadmap card and this one.

louistiti avatar Nov 14 '22 00:11 louistiti

from the model card: While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.

currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.

johannbarbie avatar Jan 28 '23 16:01 johannbarbie

from the model card: While Whisper models cannot be used for real-time transcription out of the box – their speed and size suggest that others may be able to build applications on top of them that allow for near-real-time speech recognition and translation.

currently whisper works on 30-second chunks of audio. I guess the leon responses would become very delayed.

Thanks for pointing this out. I'll take a closer look once I'll be focusing on it.

louistiti avatar Jan 31 '23 15:01 louistiti

Nah Whisper is configurable for whatever length inputs you specify, we have a Flutter port going now that is near realtime on mobile. The larger models on CPU should be realtime in performance.

pablogranolabar avatar Feb 21 '23 04:02 pablogranolabar