whisper-diarization diarization issue: All dialouges got speaker 0 only.

Below is audio file to reproduce the issue. Audio.

Actual output.

Speaker Name,in,out,Text

Speaker 0,00:04:41.4,00:07:32.15,You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You You I will give you my feedback.

Speaker 0,00:07:36.0,00:07:36.7,Okay.

Speaker 0,00:07:36.8,00:07:36.22,"All right, dear."

Speaker 0,00:07:37.4,00:07:38.21,So let's start with today's class.

Speaker 0,00:07:39.7,00:07:54.15,"And we are going to do C six today and C five you did with some other teacher, right?"

Speaker 0,00:07:55.14,00:07:55.19,Yeah.

Speaker 0,00:07:56.13,00:07:56.22,Okay.

Speaker 0,00:07:57.10,00:08:00.22,"Yeah, because I was on well, so I canceled the class."

Speaker 0,00:08:00.23,00:08:02.7,So you did it with the other teacher.

Speaker 0,00:08:02.18,00:08:02.24,Yes.

Speaker 0,00:08:04.10,00:08:05.1,"Okay, great."

Speaker 0,00:08:05.7,00:08:06.9,So you understood that?

Speaker 0,00:08:08.12,00:08:10.11,Can you tell me you understood that?

Speaker 0,00:08:10.13,00:08:13.19,Can you tell me what concept did you learn in the last class?

Speaker 0,00:08:14.10,00:08:17.5,"Yeah, I didn't understand it."

Speaker 0,00:08:22.15,00:08:23.19,You didn't understand that?

Speaker 0,00:08:24.16,00:08:25.19,I understand it.

Speaker 0,00:08:26.7,00:08:27.21,"Okay, so what was it?"

Speaker 0,00:08:28.2,00:08:31.4,Can you tell me which game you created in that class?

Speaker 0,00:08:32.11,00:08:34.11,Chasing the mouse.

Speaker 0,00:08:34.16,00:08:38.7,"Oh, that's an interesting game."

Speaker 0,00:08:38.8,00:08:38.11,Yes.

Speaker 0,00:08:49.1,00:08:49.6,Good.

Speaker 0,00:08:49.7,00:08:49.23,Fantastic.

Oct 19 '23 11:10 manjunath7472

Yep, I got the same error, have you found the issue?

Nov 01 '23 09:11 v-nhandt21

Yet not solution?

Nov 23 '23 16:11 solucionesuno

I have the same issue and investigated. It appears that the "speaker 0" for all lines is the direct output of the underlying diarization model, Nemo Toolkits: nemo.collections.asr.models.msdd_models.NeuralDiarizer. So there is a bug in the nemo toolkit, not this library. We all might be better off trying to use pyannote for the diarization.

Nov 29 '23 22:11 rbracco

I found that the problem come from model quality of Nemo: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/diar_msdd_telephonic

Since this model is only trained on telephonic speech, diarization performance on other acoustic conditions might show a degraded performance compared to telephonic speech.

Dec 01 '23 03:12 v-nhandt21

Had the same with demucs. Disabling it (--no-stem) helped.

Dec 15 '23 13:12 kalisgd0

whisper-diarization whisper-diarization copied to clipboard

diarization issue: All dialouges got speaker 0 only.

whisper-diarization
whisper-diarization copied to clipboard