Whisper GPU Performance

I have a few GPUs from my two mining rigs, thought I'd share my findings.

Setup: i3-13100F cpu (single core Geekbench 6 is 2300) GPUs: 2080 Ti 11gb 352-bit bus, 3070 8gb 256-bit bus, 3060 Ti 8gb 256-bit bus RAM: 16gb

Transcribing an 8 min English audio clip:

Using main 16x pcie slot. Both the 2080 Ti and 3060 Ti will transcribe an 8min audio using medium model in about 45 seconds.
Using a pcie 1x slot extender. 3060 Ti takes 2min 25 secs. The 3070 is about few seconds faster at 2m 17secs.

Using the OpenAI Whisper with Pytorch support for GPU, there is little difference between using the pcie 16x or 1x slot. The 3070 will do the same 8min transcription in 1m 50sec on 16x, loses a few seconds due to being on the pcie 1x slot but not much difference. On a side note the M1 Max using Whisper.cpp will do the 8m transcription in a similar 1m 45sec, so M1 Max cpu = 3070 gpu.

Not sure why the 2080 Ti and 3060 Ti are so close in performance when the 2080 Ti is 60% faster with FP16, perhaps CPU bottle necking? CPU utilization is only around 20%, but something seems to be bottle necking the GPUs.

Conclusion, the Const-me implementation is blazing fast on the 16x slot and seems like there's more GPU performance to unlock and some possible optimizations for slower slots (1x).

Apr 11 '23 04:04 jake1271

If you share your test audio file, i can add A5000 and A3000 mobile speeds :D

Apr 11 '23 08:04 emcodem

Sure here ya go, it's a tutorial on installing Whisper. https://user-images.githubusercontent.com/51808736/231159408-ddfce1a5-a633-46c0-9d44-cbd9af14f531.mp4

And for those that are wondering what the heck a pcie 1x slot extender is, it's the sled the GPUs are sitting on in the pic, mainly used on mining rigs to hookup multiple gpus.

20230411_192637_resized

Apr 11 '23 13:04 jake1271

Thanks, here the results from Cards that i have easy access to, both Professional versions. Models used from this commit: https://huggingface.co/ggerganov/whisper.cpp/commit/80da2d8bfee42b0e836fc3a9890373e5defc00a6

Model Large (V2) Nvidia RTX A3000 Laptop GPU/i9-11950H RunComplete 121.833 seconds

RTX Quadro A5000 (Desktop) / Core i9 9900K RunComplete 87.1558 seconds

Model Medium Nvidia RTX A3000 Laptop GPU/i9-11950H RunComplete 77.5319 seconds

RTX Quadro A5000 (Desktop) / Core i9 9900K RunComplete 49.7725 seconds

While operating, the cpu (on laptop and desktop, independent of the used model) looks like this. Which indicates a "slight" CPU Speed Bottleneck on my machine but nothing worrying:

It would be very interesting to see results of a 4070/80 ^^

Apr 12 '23 12:04 emcodem

Newest tests show that the original project whisper cpp actually can run faster than my A5000 above by about 20% while utilizing far less power on apple M1 ultra ^^

Apr 27 '23 16:04 emcodem

Intel Arc A380 using WhisperDesktop v1.12

medium.en model: 93 sec large-v1 model: 171 sec

@emcodem

What app did you use for large v2 model ? Because I think WhisperDesktop is incompatible.

Sep 17 '23 16:09 NikosDi

I use Whisper Desktop and Whisperer. Both of them work fine with the large model on my PC. I have a 12gb GPU.

Rick Archer Buddha at the Gas Pump https://batgap.com @.***

From: NikosDi @.> Sent: Sunday, September 17, 2023 11:49 AM To: Const-me/Whisper @.> Cc: Subscribed @.***> Subject: Re: [Const-me/Whisper] GPU Performance (Issue #85)

Intel Arc A380 using WhisperDesktop v1.12

medium.en model: 93 sec large-v1 model: 171 sec

@emcodemhttps://github.com/emcodem

What app did you use for large v2 model ? Because I think WhisperDesktop is incompatible.

— Reply to this email directly, view it on GitHubhttps://github.com/Const-me/Whisper/issues/85#issuecomment-1722517411, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7JTFIMKW7V2U3MRFF7SGNDX24SW7ANCNFSM6AAAAAAWZXR3ZM. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>

Sep 17 '23 19:09 RickArcher108

Yes, I tried both and work fine too with my A380 - large-v1 and large (v2) Large (v2) is a little slower, but probably more accurate.

The reason I asked is because a lot of people were mentioning in the open issues of this app that large (v2) is buggy and doesn't work OK with WhisperDesktop, not because of VRAM requirements which are the same with v1.

But it seems working fine.

Sep 18 '23 06:09 NikosDi

@NikosDi i never tried large-v1 but i wouldn't expect to see much difference between v1 and v2, not in terms of VRAM usage nor in accuracy speed and or "errors". If you refer to https://github.com/Const-me/Whisper/issues/166 with your assumption about V2 not working in whisperdesktop, note that the problem there was that the asker used a totally wrong model format (non ggml)

Sep 18 '23 10:09 emcodem

I assume he's talking about this thread using the president audio test example where large-v2 is much worse with the Const-me Whisper:

https://github.com/Const-me/Whisper/issues/61

Maybe it's been resolved since originally posted, but I haven't done any recent testing, and the models haven't been updated to resolve any potential conversion issues anyway if that's the root issue like some have suggested.

Sep 18 '23 18:09 albino1

@albino1 Funny enough i am also under the impression that the simple greedy search of const-me version is generally equal if not better than beam search (which other projects tend to default to). I worked into some documents and the source code of the other projects, leaving me under the impression that the others are kind of over engineering everything, trying to apply lessons learned from chat (GPT) models(but these lessons do not completely apply to the whisper usecase) and especially trying to overcome forever repeated output by applying a lot of completely worthless logic ^^

Sep 18 '23 19:09 emcodem

Intel Arc A380 using WhisperDesktop v1.12

medium.en model: 93 sec large-v1 model: 171 sec

Hey bro, i've test this mp4 file with my 5600G Vega: 404s. Your GPU is pretty fast.

Feb 23 '24 01:02 Whisper-Padawan

Whisper Whisper copied to clipboard

GPU Performance

Whisper
Whisper copied to clipboard