pyannote-audio
pyannote-audio copied to clipboard
How diarization assign speakers in overlapped sppech?
How does speaker diarization assign speakers in overlapped speech? I saw it was masked in pyannote.audio.pipelines.speaker_diarization line 294. I also question the relation between speech segmentation, resegmentation, and speaker diarization and their purpose.
I recommend you read this paper, which is an online variant of what is currently implemented in develop
branch.
Thanks for your reply! This paper I already read. But assign speaker method seems different from this. I couldn't understand why the cluster method in your diarization didn't calculate embedding in overlapping speech intervals, but it can still predict the correct prediction. Or did I miss something after you masked the overlapping speech intervals?
Overlapping frames are only masked for computing the embeddings, not for the rest of the pipeline. Anyway, I plan to write a technical report describing the approach and will share it once it is ready.
Thank you again. I really appreciate it! So assigning speakers in overlapping didn't rely on embedding but on other mechanisms. Where can I find the code?
Because my data has many speakers speaking in a short time (about 1 second), and most overlap with others, if not fine-tuned, those are often unpredictable. Before fine-tuned, I want to understand these mechanisms fully.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.