pyannote-audio icon indicating copy to clipboard operation
pyannote-audio copied to clipboard

How diarization assign speakers in overlapped sppech?

Open YuGuiwe opened this issue 2 years ago • 4 comments

How does speaker diarization assign speakers in overlapped speech? I saw it was masked in pyannote.audio.pipelines.speaker_diarization line 294. I also question the relation between speech segmentation, resegmentation, and speaker diarization and their purpose.

YuGuiwe avatar Jun 13 '22 06:06 YuGuiwe

I recommend you read this paper, which is an online variant of what is currently implemented in develop branch.

hbredin avatar Jun 13 '22 07:06 hbredin

Thanks for your reply! This paper I already read. But assign speaker method seems different from this. I couldn't understand why the cluster method in your diarization didn't calculate embedding in overlapping speech intervals, but it can still predict the correct prediction. Or did I miss something after you masked the overlapping speech intervals?

YuGuiwe avatar Jun 13 '22 07:06 YuGuiwe

Overlapping frames are only masked for computing the embeddings, not for the rest of the pipeline. Anyway, I plan to write a technical report describing the approach and will share it once it is ready.

hbredin avatar Jun 13 '22 08:06 hbredin

Thank you again. I really appreciate it! So assigning speakers in overlapping didn't rely on embedding but on other mechanisms. Where can I find the code?

Because my data has many speakers speaking in a short time (about 1 second), and most overlap with others, if not fine-tuned, those are often unpredictable. Before fine-tuned, I want to understand these mechanisms fully.

YuGuiwe avatar Jun 13 '22 08:06 YuGuiwe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 12 '22 18:08 stale[bot]