faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

It does not work normally on the RTX 5070 TI.

Open c3xingchen opened this issue 8 months ago • 14 comments

The RTX 5070 TI encountered a RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED. When the --compute_type float32 option was added, it could run, but long videos did not work properly. In videos longer than 2 hours, the program only worked for the first 33 minutes and then abruptly stopped.

c3xingchen avatar Mar 10 '25 03:03 c3xingchen

Check what is your cuBLAS version or what CUDA Toolkit version you have installed?

Purfview avatar Mar 10 '25 11:03 Purfview

Check what is your cuBLAS version or what CUDA Toolkit version you have installed?

I am not a specialist in this area of technology. I have not independently installed cBLAS and CUDA Toolkit myself. I use PotPlayer to generate subtitles using this model on my laptop with an RTX 3060, which works fine. However, when I try the same process on my RTX 5070 Ti, it encounters issues.

c3xingchen avatar Mar 10 '25 12:03 c3xingchen

RTX 5070

Don't know why compute_type auto [int8] doesn't work with these GPUs, use --compute_type float16

I use PotPlayer to generate subtitles using this model

Then you are in the wrong repo, go there: https://github.com/Purfview/whisper-standalone-win

Purfview avatar Mar 10 '25 13:03 Purfview

I also can't use it with my 5070ti. Basically, all 50 series cards are unusable.

ictsmc avatar Mar 12 '25 15:03 ictsmc

I also can't use it with my 5070ti. Basically, all 50 series cards are unusable.

Use proper settings.

Purfview avatar Mar 20 '25 13:03 Purfview

RTX 5070

Don't know why compute_type auto [int8] doesn't work with these GPUs, use --compute_type float16

I use PotPlayer to generate subtitles using this model

Then you are in the wrong repo, go there: https://github.com/Purfview/whisper-standalone-win

Thanks for this, just helped me as well, can confirm did not work on auto or int8 but did work on float16.

teddybear082 avatar Mar 28 '25 19:03 teddybear082

@teddybear082 @ictsmc, do you have this issue when using Python and this repo?

Purfview avatar Mar 29 '25 13:03 Purfview

@teddybear082 @ictsmc, do you have this issue when using Python and this repo?

I’m using faster whisper python library via WingmanAI by ShipBit: https://github.com/ShipBit/wingman-ai. They use pyinstaller to turn the python into an exe I believe and faster-whisper is one of the dependencies.

teddybear082 avatar Mar 29 '25 14:03 teddybear082

So not the Python directly. Kinda strange that in my repo I have lots of reports about this, but I don't see any reports in Python repos.

Btw, similar reports about pyannote and 50xxx GPUs, but none in pyannote repo too.

Purfview avatar Mar 29 '25 14:03 Purfview

So not the Python directly. Kinda strange that in my repo I have lots of reports about this, but I don't see any reports in Python repos.

Btw, similar reports about pyannote and 50xxx GPUs, but none in pyannote repo too.

What do you mean not the python directly? Isn't this the repo for the faster-whisper pipy python project? Wingman depends on faster-whisper=1.1.1 python library I believe. I may just be confusing what you mean.

teddybear082 avatar Mar 29 '25 15:03 teddybear082

I meant using Python directly, not the exe compiled with pyinstaller. And it strange that all reports about 50xx comes only from "exe" repos.

Maybe because the original default for compute_type is not "auto", I don't remember now, it could be "default".

Purfview avatar Mar 29 '25 15:03 Purfview

Here is link to the issue at CTranslate2: https://github.com/OpenNMT/CTranslate2/issues/1865

Purfview avatar Mar 29 '25 15:03 Purfview

same issue on my 5070 Ti. Is there anyway to force it use CPU rather than GPU?

tedaz avatar Jul 05 '25 12:07 tedaz

Hi everyone,

I’m running into the same issue on an RTX 5070 Ti, and oddly I see better performance on an RTX 2070 SUPER. Here’s what I’ve observed:

RTX 5070 Ti

With compute_type='int8_float16', I get: RuntimeError: cuBLAS failed with status CUBLAS_STATUS_NOT_SUPPORTED Switching to compute_type='float16' works, but my transcription speed is only around 21× real‑time.

RTX 2070 SUPER

I can use compute_type='int8_float16' without errors and achieve about 86× real‑time speed.

NahuBocco avatar Jul 17 '25 19:07 NahuBocco

any solution were found here?

penolove avatar Nov 27 '25 17:11 penolove

any solution were found here?

Yes.

Purfview avatar Nov 27 '25 21:11 Purfview