Max Bain

Results 37 comments of Max Bain

added here (keeping "zh" code) [c6fa7df](https://github.com/m-bain/whisperX/commit/c6fa7df3cc7caae68722363ef0736043f3d75450)

I am currently working on a robust way to do this, and there are a few ways to do it, you can use pyannote-audio for diarization. https://github.com/pyannote/pyannote-audio Either you merge...

@MahmoudAshraf97 ah yes I saw this tutorial -- i didnt know their diarization is better! I will test it on my data -- i thought pyannote was the current best,...

@Fcabla I am not sure if speech separation is needed unless you have a lot of overlapping speakers. I have good results so far using: Run whisperX and diarization separately....

@Fcabla I see yes overlapping speech is a difficult problem, probably worth using speech separation only for overlapping segments

@holynuts first attempt at including diarization in the recent commit d395c21b8399cb2f29643a75f91469917cdbb991

You might find this better than pyannote on your data: https://github.com/JaesungHuh/SimpleDiarization But depends, and ought to be constrained to whisperx sentences, i.e. Appendix Sec. A (page 13) of https://www.robots.ox.ac.uk/~vgg/publications/2023/Han23/han23.pdf

True, I guess I meant best patch for now that doesn't require a new model

Memory scales with length of input segment to wav2vec2 model. With current version, adding --vad_filter Ensures that no segments greater than 30s are inputted to the w2v model. You can...

What do you mean by incoherent timestamps, could you be more specific, eg with example