audapolis icon indicating copy to clipboard operation
audapolis copied to clipboard

Speaker diarization with pyannote.audio?

Open hbredin opened this issue 2 years ago • 7 comments

I am the creator of pyannote.audio speaker diarization toolkit.

I understand that you went with @josepatino's PyBK because of its speed but I'd love to see pyannote.audio pretrained pipeline integrated into audapolis.

Would that be of interest to you? I'd love to help in some way!

hbredin avatar Jun 02 '22 12:06 hbredin

Hey, thanks for reaching out. We would be interested in integrating this 100%. When I started looking at speaker diarization I also noticed pyannote.audio, but as there was no pretrained pipeline at the time we decided against it.

pajowu avatar Jun 02 '22 13:06 pajowu

Do you think it would be possible to extend the SpeakerDiarization-pipeline to not only report the individual steps of the pipeline via the hook, but also the progress within certain steps? This would be a huge benefit for us

pajowu avatar Jun 02 '22 13:06 pajowu

I have been meaning to add this kind of progress hook for the online demo but it never really reached the top of my priority list.

Those are the two steps that really make most of the processing time:

hbredin avatar Jun 02 '22 14:06 hbredin

FYI, I just released a much faster/more accurate version of pyannote.audio speaker diarization pipeline. It still does not expose the progress of the individual steps but this is now on my TODO list (though with no ETA).

hbredin avatar Jul 21 '22 07:07 hbredin

Wow, I just tried it (and opened https://github.com/pyannote/pyannote-audio/pull/1185/files for the progress). The results are really impressive 😍

pajowu avatar Dec 08 '22 19:12 pajowu

I started integrating it and stumbled upon a problem which I'm currently not sure how to solve, so if you have any idea @hbredin, I would be very interested: audapolis currently works on the assumption that there is only one "speaker" at any time. pyannote-audio on the other hand supports multiple speakers at the same time. It therefor produces overlaps between the speakers.

Since changing audapolis to support multiple speakers is too much for now, I'm trying to "flatten" the output of pyannote-audio to 1 speaker at a time. Do you have a suggestion on how to do that properly?

pajowu avatar Dec 17 '22 02:12 pajowu

Nothing built in pyannote comes to mind.

You'd have to postprocess the pyannote.core.Annotation instance returned by the pipeline:

  1. remove any segment fully contained by a larger segment
[------A-------]   ==> [------A-------]
    [--B--]
  1. split partially overlapping segments in two halves
[----A----]     ==> [----A--]
     [----B----]            [--B----]

Or you could clip the output of the speaker counting step to be at most 1.

count.data = np.clip(count.data, 0, 1)

should do the trick...

hbredin avatar Dec 18 '22 14:12 hbredin