whisper.cpp
whisper.cpp copied to clipboard
How to get best performance on moderately old CPU and hardware?
Hello! Very nice project and port.
I am wondering, using an i7-6700 or even older CPU, is there possibility to get near realtime transcription? (At least as long to compute as it is to hear)
I am using the windows binaries in the release page and it takes about twice as long to generate the text than the audio length. I am looking to get that about even if possible.
As an aside, it would be nice to make use of older hardware using CUDA. I have an older GPU here (GT 730) which is not supported by PyTorch (Compute Capabilities 3.5 and pytorch supports at least 3.7). I am wondering if some "generic" optimization could be implemented by using CUDA even without fancy pytorch stuff. This would make for my older server to run this speech to text and possibly make many more people be able to get access to this wonderful tech (I'm thinking of vision-impaired people for example, that are not so rich to get the latest apple M3)
Thanks!
Your I7-6700 has Intel® AVX2 so it should do OK ...
But if it's too slow, AND you are running windows anyways, AND you have a GPU in the machine, it might be worth checking out this whisper implementation, based on whisper.cpp https://github.com/Const-me/Whisper
It uses Windows GPU compute API's (so works with any GPU card I believe, not just CUDA)
I've had VERY good results from it, just a pitty it's windows only... I wish there was a Vulkan compute implementation :) (hint to anyone who has a clue with Vulkan)
Jay
On Wed, 15 Mar 2023 at 13:05, Raphaël Côté @.***> wrote:
Hello! Very nice project and port.
I am wondering, using an i7-6700 or even older CPU, is there possibility to get near realtime transcription? (At least as long to compute as it is to hear)
I am using the windows binaries in the release page and it takes about twice as long to generate the text than the audio length. I am looking to get that about even if possible.
As an aside, it would be nice to make use of older hardware using CUDA. I have an older GPU here (GT 730) which is not supported by PyTorch (Compute Capabilities 3.5 and pytorch supports at least 3.7). I am wondering if some "generic" optimization could be implemented by using CUDA even without fancy pytorch stuff. This would make for my older server to run this speech to text and possibly make many more people be able to get access to this wonderful tech (I'm thinking of vision-impaired people for example, that are not so rich to get the latest apple M3)
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/ggerganov/whisper.cpp/issues/611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALQR623COGIQXJSWXBEXFLW4EWWFANCNFSM6AAAAAAV3HIG5A . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Sincerely
Jay
Thanks for taking time to answer me and to point me towards Const-me's approach! It is very nice to see that work is being done to propagate this tech to all the computers in the world.
Sadly, for my older server, the CPU seems either too old or too weak-abled to make it work (seems to lack F16C capability, bummer). I have yet to try it on my tower PC.
If anyone else has a suggestion on how I could go about running this on an old Pentium G3220, I'd love it :)
Try Turning off avx2 and fma and even avx defines at compile time if you have trouble. It worked for me.
Try Turning off avx2 and fma and even avx defines at compile time if you have trouble. It worked for me.
Very great idea, I never thought of compiling myself. I just used the binaries in the release page. Thanks for the info! I will try soon and update this issue
Yes, Lets us know. I run a i5 from 2013. The sample jfk.wav takes 4.6 seconds to encode on single thread with tiny.en model. 2.6 seconds with 4 threads.
I have a i5-8265U and I'm quite amazed at performance.
However... I noticed there's 2.7s overhead for base model (1.3s overhead for tiny) for a single word of input (<1s). I assume most of that is loading the model. A 10s input sample of several sentences takes only 25% longer.
That's plenty fast for transcriptions, but quite a bit of latency for interactive usage. I tried the stream app, but it didn't produce very good output text.
It would be great if Whisper.cpp could be built as a http server, maybe even implementing openai's protocol.