GPU power needed for transcribing live stream using large-v3 model

Open sunarowicz opened this issue 1 month ago • 0 comments

Sorry, this not an issue, but rather question or survey. But I didn't find any other way than writing it here.

Please tell me guys, what GPU do you use to transcribe live stream like phone call with FasterWhisper backend and large-v3 model. And what is your experience on how much is your GPU stressed by this task?

The reason why I'm asking is because I run this kind of task on single Nvidia RTX A4000 GPU which is, according to my experience, clearly underpowered for this. The GPU runs almost all the time at 100% and the transcription quality is often not good which I suppose is because the GPU simply cannot handle it better. Currently I run WhisperLive in GPU docker container, but I had similar experience earlier when I ran it in virtual Pytnon environment.

I wonder whether WhisperLive is so much more power demanding then WhisperX (uses FasterWhisper backend too) which I use for transcribing phone call records. WhisperX transcribes call records multiple times faster then realtime on the same GPU. So, it really confuses me.

Is there anybody running WhisperLive on NVIDIA DGX Spark or AMD Strix Halo? What is your experience?

Nov 14 '25 15:11 sunarowicz