faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Change onnxruntime requirement to gpu version and update VAD to run on gpu

Open thomasmol opened this issue 1 year ago • 4 comments

See discussions here: https://github.com/pyannote/pyannote-audio/issues/1481, https://github.com/guillaumekln/faster-whisper/issues/493, https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083

This pull requests lets the VAD run on gpu using onnxruntime-gpu rather than onnxruntime. There are some issues when depending on both packages: it will default to the cpu version if both are installed. This is mostly a problem when running faster-whisper in conjunction with pyannote.audio (or other libraries that specifically need to run on gpu using onnxruntime-gpu).

thomasmol avatar Sep 30 '23 09:09 thomasmol

Does onnxruntime-gpu fall back to CPU support if there is no GPU? Not everyone is using CUDA, some are using CPU.

celliso1 avatar Oct 24 '23 23:10 celliso1

Unfortunately it does not, so I don't think this pull request will get accepted. I'll leave it up for now if anyone runs into the same issue I experienced that led me to create this pull request.

thomasmol avatar Oct 25 '23 06:10 thomasmol

Per https://onnxruntime.ai/docs/execution-providers/, you can set multiple Execution Providers. I'm not savvy enough today to try this myself, but would it fix the problem?

import onnxruntime as rt

#define the priority order for the execution providers
# prefer CUDA Execution Provider over CPU Execution Provider
EP_list = ['CUDAExecutionProvider', 'CPUExecutionProvider']

# initialize the model.onnx
sess = rt.InferenceSession("model.onnx", providers=EP_list)

celliso1 avatar Oct 25 '23 17:10 celliso1

FYI, you shouldn't run it on CUDA as the model is not meant to run on it.

Benchmark on ~2h audio with RTX4090:

CUDA: 72.22 seconds CPU: 15.15 seconds

Purfview avatar Mar 27 '24 15:03 Purfview