faster-whisper
faster-whisper copied to clipboard
Change onnxruntime requirement to gpu version and update VAD to run on gpu
See discussions here: https://github.com/pyannote/pyannote-audio/issues/1481, https://github.com/guillaumekln/faster-whisper/issues/493, https://github.com/guillaumekln/faster-whisper/issues/364#issuecomment-1645272083
This pull requests lets the VAD run on gpu using onnxruntime-gpu
rather than onnxruntime
. There are some issues when depending on both packages: it will default to the cpu version if both are installed. This is mostly a problem when running faster-whisper in conjunction with pyannote.audio (or other libraries that specifically need to run on gpu using onnxruntime-gpu
).
Does onnxruntime-gpu fall back to CPU support if there is no GPU? Not everyone is using CUDA, some are using CPU.
Unfortunately it does not, so I don't think this pull request will get accepted. I'll leave it up for now if anyone runs into the same issue I experienced that led me to create this pull request.
Per https://onnxruntime.ai/docs/execution-providers/, you can set multiple Execution Providers. I'm not savvy enough today to try this myself, but would it fix the problem?
import onnxruntime as rt
#define the priority order for the execution providers
# prefer CUDA Execution Provider over CPU Execution Provider
EP_list = ['CUDAExecutionProvider', 'CPUExecutionProvider']
# initialize the model.onnx
sess = rt.InferenceSession("model.onnx", providers=EP_list)
FYI, you shouldn't run it on CUDA as the model is not meant to run on it.
Benchmark on ~2h audio with RTX4090:
CUDA: 72.22 seconds CPU: 15.15 seconds