pyannote-audio
pyannote-audio copied to clipboard
speaker-diarization-3.1 using 0% gpu
I'm using Google Colab and it is utilizing 0% gpu. Sometimes it uses 100% for a second and then it goes back to 0%. The audio is about 1.5 hours long. Is this normal behaviour? Keep in mind CPU is at 100% almost all of the time.
Thank you for your issue.You might want to check the FAQ if you haven't done so already.
Feel free to close this issue if you found an answer in the FAQ.
If your issue is a feature request, please read this first and update your request accordingly, if needed.
If your issue is a bug report, please provide a minimum reproducible example as a link to a self-contained Google Colab notebook containing everthing needed to reproduce the bug:
- installation
- data preparation
- model download
- etc.
Providing an MRE will increase your chance of getting an answer from the community (either maintainers or other power users).
Companies relying on pyannote.audio
in production may contact me via email regarding:
- paid scientific consulting around speaker diarization and speech processing in general;
- custom models and tailored features (via the local tech transfer office).
This is an automated reply, generated by FAQtory
Same here
I somehow fixed this by using
pip uninstall pyannote.audio
conda install -c conda-forge pyannote.core
pip install pyannote.audio
on my M3
this worked for me after uninstalling onnxruntime
and onnxruntime-gpu
pip install optimum[onnxruntime-gpu]
See this link:
https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu#installation
@KennethTrinh I don't understand how that'll work, because new version isn't dependent on onnx.
@KennethTrinh I tested your solution and It didn't work for me.
oops I was running in a different environment than you guys (was in ec2 instance) - apologies! I tried to reproduce in a colab notebook with a plain old T4 gpu:
Non-blocking code to poll the gpu for usage and memory (I don't have cloud shell since I'm poor!) - run this first if you don't have cloud shell
import threading
import subprocess
import time
def run_nvidia_smi():
while True:
try:
output = subprocess.check_output(
['nvidia-smi', '--query-gpu=timestamp,utilization.gpu,utilization.memory', '--format=csv']
)
output_str = output.decode('utf-8').strip()
with open('output.log', 'a') as f:
f.write(output_str + '\n')
except subprocess.CalledProcessError as e:
print(f"Error running nvidia-smi: {e}")
time.sleep(1)
thread = threading.Thread(target=run_nvidia_smi)
thread.start()
Code to run the diarization - don't forget to define your TOKEN
beforehand
!pip install -q --upgrade pyannote.audio
!pip install -q transformers==4.35.2
!pip install -q datasets
import os
import torch
import soundfile as sf
import json
import transformers
from pyannote.audio import Pipeline
from datasets import load_dataset
DIARIZATION_MODEL = "pyannote/speaker-diarization-3.1"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
concatenated_librispeech = load_dataset("sanchit-gandhi/concatenated_librispeech", split="train", streaming=True)
sample = next(iter(concatenated_librispeech))
diarization_pipeline = Pipeline.from_pretrained(
DIARIZATION_MODEL,
use_auth_token = TOKEN,
).to(device)
input_tensor = torch.from_numpy(sample['audio']['array']).float().unsqueeze(0) # (channel, time)
output = diarization_pipeline({"waveform": input_tensor, "sample_rate": sample['audio']['sampling_rate']})
output
My logs show that the gpu was indeed used (albeit very little, but my audio was (347360,)
samples , so may change with your audio). The key difference is that I'm passing in a torch tensor of shape (channel, time)
, but you guys are just passing the .wav file?
timestamp, utilization.gpu [%], utilization.memory [%]
2023/11/27 19:34:24.171, 1 %, 0 %
timestamp, utilization.gpu [%], utilization.memory [%]
2023/11/27 19:34:25.188, 6 %, 1 %
timestamp, utilization.gpu [%], utilization.memory [%]
2023/11/27 19:34:26.208, 5 %, 0 %
I found that when passing a filename to the speaker diarization pipeline, I got very poor performance. Upon profiling I discovered this was due to many many calls to get_torchaudio_info
and torchaudio._backend.ffmpeg._load_audio_fileobj
. This indicated to me that the file was being reprocessed many times unnecessarily (this was all coming from the "crop" method). I noticed that there were very different codepaths if the incoming file object already had a "waveform" computed, so I did the following:
import torchaudio
waveform, sample_rate = torchaudio.load("segment_0.wav")
audio_file = {"waveform": waveform, "sample_rate": sample_rate}
and then I passed audio_file to my pipeline. This took my runtime from 5m to 13s.
I suspect it would be straightforward to change the code to perform this step internally initially and save the user the trouble... and also probably simplify a lot of the downstream code as it could just assume it always had a waveform.
https://github.com/pyannote/pyannote-audio/issues/1557#issuecomment-1922466847 (comment above) solved it also for me, using an eGPU and cuda.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.