whishper icon indicating copy to clipboard operation
whishper copied to clipboard

[Feature] Option to choose between Pyannote and NeMo for diarization

Open Arche151 opened this issue 6 months ago • 0 comments

First of all, @pluja I want to thank you again for developing whishper/soon to be anysub!

I basically check out the v4 branch every day, because I'm too excited for when anysub is ready! :) And I can't believe, that my feature request - user authentication - will actually be implemented. Thanks so much for that!

My new feature request probably comes way too late, considering how deeply WhisperX will be integrated into anysub and how much work you've put into the WhisperX API, but I want to try anyway.

I suggest adding the option to choose between Pyannote and Nvidia NeMo for diarization for two reasons:

  1. Unlike Pyannote NeMo is truly open source, with no requirement for obtaining and entering an authorization token.
  2. From my personal tests and to my surprise NeMo is way better than Pyannote at accurately diarizing speakers.

@MahmoudAshraf97 created whisper-diarization which is in parts based on WhisperX, but uses NeMo for diarization.

I know, that I am asking a lot here, but for the two reasons, that I stated, I would really appreciate it, if you could still consider it.

Arche151 avatar Aug 12 '24 12:08 Arche151