whisper.cpp Talk = GPT-2 + Whisper + WASM

Talk = GPT-2 + Whisper + WASM

Open ggerganov opened this issue 3 years ago • 3 comments

trafficstars

I just had an awesome idea:

Make a web-page that:

Listens when someone speaks
Transcribes the words using WASM Whisper
Generates a new sentence using WASM GPT-2
Uses Web Speech API to synthesise the speech and play it on the speakers.

All of this running locally in the browser - no server required

I have all the ingredients and I think the performance is just enough. I just have to put it together. The total data that the page will have to load on startup (probably using Fetch API) is:

74 MB for the Whisper tiny.en model
240 MB for the GPT-2 small model
Web Speech API is built-in in modern browsers

I think it will be very fun because you could talk to the web-page or even add extra devices that talk to each other only through the mic and the speakers. For example, you simply open the page on your phone and tablet and put them next to each other - listen to them talk about something 😄

Any ideas to make this even more fun?

Nov 17 '22 16:11 ggerganov

this sounds really fun!

Nov 17 '22 16:11 eschmidbauer

So.. this is turning out to be even better than I expected 😆

https://user-images.githubusercontent.com/1991296/202914175-115793b1-d32e-4aaa-a45b-59e313707ff6.mp4

Nov 20 '22 16:11 ggerganov

These results are extremely impressive! I recently tried to implement something similar in Python, only not locally, but instead using different online APIs, but it felt worse than your demo video because Whisper is much better than the free Google Speech Recognition API (and your optimized version runs significantly better on CPU than the standard Whisper Python lib I tried) :).

Nov 20 '22 22:11 Vuizur

whisper.cpp whisper.cpp copied to clipboard

Talk = GPT-2 + Whisper + WASM

whisper.cpp
whisper.cpp copied to clipboard