Max Bain comments

Results 37 comments of


                                            Max Bain

Does it support Chinese?

added here (keeping "zh" code) [c6fa7df](https://github.com/m-bain/whisperX/commit/c6fa7df3cc7caae68722363ef0736043f3d75450)

Enhancement: Possible to determine different speakers?

I am currently working on a robust way to do this, and there are a few ways to do it, you can use pyannote-audio for diarization. https://github.com/pyannote/pyannote-audio Either you merge...

Enhancement: Possible to determine different speakers?

@MahmoudAshraf97 ah yes I saw this tutorial -- i didnt know their diarization is better! I will test it on my data -- i thought pyannote was the current best,...

Enhancement: Possible to determine different speakers?

@Fcabla I am not sure if speech separation is needed unless you have a lot of overlapping speakers. I have good results so far using: Run whisperX and diarization separately....

Enhancement: Possible to determine different speakers?

@Fcabla I see yes overlapping speech is a difficult problem, probably worth using speech separation only for overlapping segments

Enhancement: Possible to determine different speakers?

@holynuts first attempt at including diarization in the recent commit d395c21b8399cb2f29643a75f91469917cdbb991

Enhancement: Possible to determine different speakers?

You might find this better than pyannote on your data: https://github.com/JaesungHuh/SimpleDiarization But depends, and ought to be constrained to whisperx sentences, i.e. Appendix Sec. A (page 13) of https://www.robots.ox.ac.uk/~vgg/publications/2023/Han23/han23.pdf

Max Bain

Does it support Chinese?

Enhancement: Possible to determine different speakers?

Enhancement: Possible to determine different speakers?

Enhancement: Possible to determine different speakers?

Enhancement: Possible to determine different speakers?

Enhancement: Possible to determine different speakers?

Enhancement: Possible to determine different speakers?

Fix catastrophic timestamp drifting from negative duration via clamping

OOM && Large GPU usage

Still some incoherent timestamps in the srt file