insanely-fast-whisper [Discussion] Speaker diarisation options

Currently, we are leveraging Pyannotes speaker diarisation. However, there is still scope for improvement here, and we should be able to leverage other open-source packages like NVIDIA NeMo.

I'd like to know if the community has had any experience with this and comparisons between pyannote and NeMo for diarisation.

Copying a comment from #46

Might I suggest using nemo toolkit instead? It seems to avoid pyannote's requirement of using a huggingface key or what not to access their model. omarsiddiqi224 is the one who posted a link to a repository that relies on it instead of pyannote.

Nov 30 '23 15:11 Vaibhavs10

My two cents :-) One can actually use pyannote pretrained models without Huggingface authentication. As soon as the model has been downloaded and cached once (yes, this needs a HF token), you no longer need the token for subsequent calls.

@Vaibhavs10, you might want to add a insanely-fast-whisper download --hf-token ... command to do just that (= download and cache the models once and for all). Subsequent calls to insanely-fast-whisper would then use this cached version...

The only reason for this HF token thing is for me to know a bit more about my user base.
I am completely blind without this. Thanks for your understanding.

Nov 30 '23 15:11 hbredin

Yeah! Makes sense! Adding an HF Token, in my opinion, is not much of an inconvenience. I'd rework the overall API a bit more over the weekend to make it easier for people to use.

Looking at the codebase, do you have any suggestions for me to make the diarisation process even faster btw?

Nov 30 '23 17:11 Vaibhavs10

here is one repo

https://github.com/MahmoudAshraf97/whisper-diarization

also for diarization ,you havnt updated readme file & also in collab its not working

Dec 01 '23 05:12 akashAD98

@Vaibhavs10 @akashAD98 can you please provide example code/notebook for diarization? something like this one https://github.com/Vaibhavs10/insanely-fast-whisper/blob/main/notebooks/infer_faster_whisper_large_v2.ipynb

Dec 12 '23 02:12 olegchomp

My two cents :-) One can actually use pyannote pretrained models without Huggingface authentication. As soon as the model has been downloaded and cached once (yes, this needs a HF token), you no longer need the token for subsequent calls.

@Vaibhavs10, you might want to add a insanely-fast-whisper download --hf-token ... command to do just that (= download and cache the models once and for all). Subsequent calls to insanely-fast-whisper would then use this cached version...

The only reason for this HF token thing is for me to know a bit more about my user base. I am completely blind without this. Thanks for your understanding.

Do we have any updates on running the Pyannote locally? would it be possible to download the model, and run it locally?

Dec 13 '23 17:12 omarsiddiqi224

@omarsiddiqi224 - It does already. After the first time of downloading the weights it should work locally without the need to pass the token.

Dec 17 '23 13:12 Vaibhavs10

insanely-fast-whisper insanely-fast-whisper copied to clipboard

[Discussion] Speaker diarisation options

insanely-fast-whisper
insanely-fast-whisper copied to clipboard