whisper-auto-transcribe
whisper-auto-transcribe copied to clipboard
speaker diarization
Have test pyannote-audio as speaker diarization. The error rate is about 30% and need lots of extra install step. In other hands, segmentation (and VAD) is working pretty good. I'll temporarily put on hold speaker diarization until beta version complete.
A successful example about whisper + speaker diarization.
https://github.com/MahmoudAshraf97/whisper-diarization
Another example https://huggingface.co/spaces/vumichien/Whisper_speaker_diarization/blob/main/app.py
device = 0 if torch.cuda.is_available() else "cpu" | pipe = pipeline( | task="automatic-speech-recognition", | model=MODEL_NAME, | chunk_length_s=30, | device=device, | ) | os.makedirs('output', exist_ok=True) | pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task="transcribe") | | embedding_model = PretrainedSpeakerEmbedding( | "speechbrain/spkrec-ecapa-voxceleb", | device=torch.device("cuda" if torch.cuda.is_available() else "cpu"))