faster-whisper
faster-whisper copied to clipboard
How to assign more CPU on this python script
Is there any possible way to assign more CPU on this script. Honestly, is's super fast on my windows machine. However, I discover that it only takes maybe 60%-70% CPU, so if there's any possible way to make fully use of my CPU. Or is there any other way to improve the speed without losing the quality.
By default it uses 4 CPU threads. You can set the constructor argument cpu_threads
to use a different value:
model = WhisperModel(model_path, device="cpu", cpu_threads=6)
By default it uses 4 CPU threads. You can set the constructor argument
cpu_threads
to use a different value:model = WhisperModel(model_path, device="cpu", cpu_threads=6)
Thank you for your patient explanation! I have another question, can this project support translation for Whisper? It seems that tasks can be specified as transcribe or translate.
Yes, you can set task="translate"
to translate any audio to English:
model.transcribe(..., task="translate")
By default it uses 4 CPU threads. You can set the constructor argument
cpu_threads
to use a different value:model = WhisperModel(model_path, device="cpu", cpu_threads=6)
I tried to modify the relevant code as per the instructions you provided using the micro-machines.wav file provided on the Whisper website. The audio duration was approximately 29 seconds. I tested the subtitle recognition using the large-v2 model on a Windows computer CPU. Here are the specific results:
- Using 4 cores, the processing time was 1 minute and 38 seconds.
- Using 12 cores, the processing time was 1 minute and 35 seconds.
- Using 16 cores, the processing time was 1 minute and 29 seconds.
It appears that allocating more CPU cores did not significantly improve the processing time. Should I try using a GPU for testing, or should I attempt to further modify the code to allow for concurrent processing of a single file, even though it may affect the order of processing? Additionally, I noticed that there is a num_workers field in your code, but it seems to refer to the number of tasks being run simultaneously. Therefore, I would like to know what other ways there are to further optimize the processing time. Would specifying the language directly before running the model reduce the time cost of language auto-detection?
How many physical CPU cores does your system have? Note that the processing time will barely reduce when using more threads than the number of physical CPU cores.
Should I try using a GPU for testing
Of course if you have access to a GPU it would speed things quite a lot, especially for the large models.
should I attempt to further modify the code to allow for concurrent processing of a single file, even though it may affect the order of processing? Additionally, I noticed that there is a num_workers field in your code, but it seems to refer to the number of tasks being run simultaneously.
num_workers
allows multiple concurrent call to transcribe
to run in parallel. You could split the audio into 30s chunks and call transcribe
from multiple Python threads but it will not be that much faster than setting cpu_threads
(also it will not use the previous context so the transcription will be different).
num_workers
is mostly useful when this project is used in a multithreaded webserver and you want multiple transcriptions to run in parallel.
Would specifying the language directly before running the model reduce the time cost of language auto-detection?
Sure, that would help.
Therefore, I would like to know what other ways there are to further optimize the processing time.
There are multiple options that can optimize the processing time on CPU at the cost of different, usually worse, transcriptions:
- Load the model in INT8 with
compute_type="int8"
- Use
beam_size=1
- Disable the temperature fallback
temperature=0
- Disable the context
condition_on_previous_text=False
(only useful for audio longer than 30s)
I tried to build this program on a GPU server using T4 graphics card. The result was very shocking. A 25-second video takes over a minute to recognize on my computer, but it only took 9 seconds on the GPU server. It's really powerful. Thanks to the author's code!
Cool, that's a nice speedup!
Let me close this issue as all questions were answered. Feel free to reopen other issues if you have other specific questions.