The WhisperModel function num_workers How to make the execution truly parallel

Open fanqiangwei opened this issue 1 year ago • 1 comments

The WhisperModel function has a parameter num_workers. If I only have one GPU and multiple threads call the transcribe function, the test result is serial. Two threads processing two files takes twice as long as one thread processing one file. How to make the execution truly parallel？

Feb 29 '24 02:02 fanqiangwei

@fanqiangwei , hello. I think that using multi workers on a single GPU will not increase the throughput, you should use multi GPU with option device_index. For more information, see this issue: https://github.com/SYSTRAN/faster-whisper/issues/100

Mar 01 '24 06:03 trungkienbkhn