faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

How to assign more CPU on this python script

Open B1gM8c opened this issue 1 year ago • 5 comments

Is there any possible way to assign more CPU on this script. Honestly, is's super fast on my windows machine. However, I discover that it only takes maybe 60%-70% CPU, so if there's any possible way to make fully use of my CPU. Or is there any other way to improve the speed without losing the quality.

B1gM8c avatar Mar 09 '23 15:03 B1gM8c

By default it uses 4 CPU threads. You can set the constructor argument cpu_threads to use a different value:

model = WhisperModel(model_path, device="cpu", cpu_threads=6)

guillaumekln avatar Mar 09 '23 15:03 guillaumekln

By default it uses 4 CPU threads. You can set the constructor argument cpu_threads to use a different value:

model = WhisperModel(model_path, device="cpu", cpu_threads=6)

Thank you for your patient explanation! I have another question, can this project support translation for Whisper? It seems that tasks can be specified as transcribe or translate.

B1gM8c avatar Mar 10 '23 01:03 B1gM8c

Yes, you can set task="translate" to translate any audio to English:

model.transcribe(..., task="translate")

guillaumekln avatar Mar 10 '23 09:03 guillaumekln

By default it uses 4 CPU threads. You can set the constructor argument cpu_threads to use a different value:

model = WhisperModel(model_path, device="cpu", cpu_threads=6)

I tried to modify the relevant code as per the instructions you provided using the micro-machines.wav file provided on the Whisper website. The audio duration was approximately 29 seconds. I tested the subtitle recognition using the large-v2 model on a Windows computer CPU. Here are the specific results:

  • Using 4 cores, the processing time was 1 minute and 38 seconds.
  • Using 12 cores, the processing time was 1 minute and 35 seconds.
  • Using 16 cores, the processing time was 1 minute and 29 seconds.

It appears that allocating more CPU cores did not significantly improve the processing time. Should I try using a GPU for testing, or should I attempt to further modify the code to allow for concurrent processing of a single file, even though it may affect the order of processing? Additionally, I noticed that there is a num_workers field in your code, but it seems to refer to the number of tasks being run simultaneously. Therefore, I would like to know what other ways there are to further optimize the processing time. Would specifying the language directly before running the model reduce the time cost of language auto-detection?

B1gM8c avatar Mar 10 '23 10:03 B1gM8c

How many physical CPU cores does your system have? Note that the processing time will barely reduce when using more threads than the number of physical CPU cores.

Should I try using a GPU for testing

Of course if you have access to a GPU it would speed things quite a lot, especially for the large models.

should I attempt to further modify the code to allow for concurrent processing of a single file, even though it may affect the order of processing? Additionally, I noticed that there is a num_workers field in your code, but it seems to refer to the number of tasks being run simultaneously.

num_workers allows multiple concurrent call to transcribe to run in parallel. You could split the audio into 30s chunks and call transcribe from multiple Python threads but it will not be that much faster than setting cpu_threads (also it will not use the previous context so the transcription will be different).

num_workers is mostly useful when this project is used in a multithreaded webserver and you want multiple transcriptions to run in parallel.

Would specifying the language directly before running the model reduce the time cost of language auto-detection?

Sure, that would help.

Therefore, I would like to know what other ways there are to further optimize the processing time.

There are multiple options that can optimize the processing time on CPU at the cost of different, usually worse, transcriptions:

  • Load the model in INT8 with compute_type="int8"
  • Use beam_size=1
  • Disable the temperature fallback temperature=0
  • Disable the context condition_on_previous_text=False (only useful for audio longer than 30s)

guillaumekln avatar Mar 10 '23 10:03 guillaumekln

I tried to build this program on a GPU server using T4 graphics card. The result was very shocking. A 25-second video takes over a minute to recognize on my computer, but it only took 9 seconds on the GPU server. It's really powerful. Thanks to the author's code!

B1gM8c avatar Mar 17 '23 05:03 B1gM8c

Cool, that's a nice speedup!

Let me close this issue as all questions were answered. Feel free to reopen other issues if you have other specific questions.

guillaumekln avatar Mar 17 '23 09:03 guillaumekln