insanely-fast-whisper icon indicating copy to clipboard operation
insanely-fast-whisper copied to clipboard

[Discussion] Speaker diarisation options

Open Vaibhavs10 opened this issue 1 year ago • 7 comments

Currently, we are leveraging Pyannotes speaker diarisation. However, there is still scope for improvement here, and we should be able to leverage other open-source packages like NVIDIA NeMo.

I'd like to know if the community has had any experience with this and comparisons between pyannote and NeMo for diarisation.

Copying a comment from #46

Might I suggest using nemo toolkit instead? It seems to avoid pyannote's requirement of using a huggingface key or what not to access their model. omarsiddiqi224 is the one who posted a link to a repository that relies on it instead of pyannote.

Vaibhavs10 avatar Nov 30 '23 15:11 Vaibhavs10

My two cents :-) One can actually use pyannote pretrained models without Huggingface authentication. As soon as the model has been downloaded and cached once (yes, this needs a HF token), you no longer need the token for subsequent calls.

@Vaibhavs10, you might want to add a insanely-fast-whisper download --hf-token ... command to do just that (= download and cache the models once and for all). Subsequent calls to insanely-fast-whisper would then use this cached version...

The only reason for this HF token thing is for me to know a bit more about my user base.
I am completely blind without this. Thanks for your understanding.

hbredin avatar Nov 30 '23 15:11 hbredin

Yeah! Makes sense! Adding an HF Token, in my opinion, is not much of an inconvenience. I'd rework the overall API a bit more over the weekend to make it easier for people to use.

Looking at the codebase, do you have any suggestions for me to make the diarisation process even faster btw?

Vaibhavs10 avatar Nov 30 '23 17:11 Vaibhavs10

here is one repo

https://github.com/MahmoudAshraf97/whisper-diarization

also for diarization ,you havnt updated readme file & also in collab its not working

akashAD98 avatar Dec 01 '23 05:12 akashAD98

@Vaibhavs10 @akashAD98 can you please provide example code/notebook for diarization? something like this one https://github.com/Vaibhavs10/insanely-fast-whisper/blob/main/notebooks/infer_faster_whisper_large_v2.ipynb

olegchomp avatar Dec 12 '23 02:12 olegchomp

My two cents :-) One can actually use pyannote pretrained models without Huggingface authentication. As soon as the model has been downloaded and cached once (yes, this needs a HF token), you no longer need the token for subsequent calls.

@Vaibhavs10, you might want to add a insanely-fast-whisper download --hf-token ... command to do just that (= download and cache the models once and for all). Subsequent calls to insanely-fast-whisper would then use this cached version...

The only reason for this HF token thing is for me to know a bit more about my user base. I am completely blind without this. Thanks for your understanding.

Do we have any updates on running the Pyannote locally? would it be possible to download the model, and run it locally?

omarsiddiqi224 avatar Dec 13 '23 17:12 omarsiddiqi224

@omarsiddiqi224 - It does already. After the first time of downloading the weights it should work locally without the need to pass the token.

Vaibhavs10 avatar Dec 17 '23 13:12 Vaibhavs10