faster-whisper icon indicating copy to clipboard operation
faster-whisper copied to clipboard

Limited GPU Utilization with NVIDIA RTX 4000 Ada Gen

Open James-Shared-Studios opened this issue 9 months ago • 13 comments

I am experiencing limited GPU utilization with the NVIDIA RTX 4000 Ada Gen card while running on Windows 10 1809 CPU: AMD EPYC 3251 8-Core Processor 2.5 GHz RAM: 32GB GPU: NVIDIA RTX 4000 Ada Gen 20 GB CUDA Toolkit Version: 12.3 GPU Driver Version: 546.12

Python code:

   device = 'cuda'
   compute_type = 'int8_float16'
   model_size = 'medium.en'

   print(f"Loading model...")

   start_time = time.time()
   model = WhisperModel(model_size, device=device, 
                        compute_type=compute_type)
   end_time = time.time()
   execution_time = end_time - start_time
   print(f"Model loading time: {execution_time:.2f} seconds")
   folder_path = r"C:\Users\XYZ\Downloads\AI voice"
   max_new_tokens = 10
   beam_size = 10

   for filename in os.listdir(folder_path):
       if filename.endswith(".mp3") or filename.endswith(".m4a") or filename.endswith(".mp4") or filename.endswith(".wav"):
           file_path = os.path.join(folder_path, filename)
           print(f"Transcribing file: {file_path}")
           start_time = time.time()
           segments, _ = model.transcribe(file_path,
                                          beam_size=beam_size,
                                          max_new_tokens=max_new_tokens,
                                          word_timestamps = False,
                                          prepend_punctuations = "",
                                          append_punctuations = "",
                                          language="en", condition_on_previous_text=False)
           for segment in segments:
               print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
           end_time = time.time()
           execution_time = end_time - start_time
           print(f"Execution time: {execution_time:.2f} seconds")
           total_processing_time += execution_time

While running my code, I'm only observing around 10% GPU utilization. image

However, the same code achieves 100% utilization on an NVIDIA GeForce RTX 4070. image

James-Shared-Studios avatar May 17 '24 04:05 James-Shared-Studios